sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:50:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6975830 # total simulation time in cycles
sim_IPC                      1.8789 # instructions per cycle
sim_CPI                      0.5322 # cycles per instruction
sim_exec_BW                  1.8849 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25959568 # cumulative IFQ occupancy
IFQ_fcount                  6364289 # cumulative IFQ full count
ifq_occupancy                3.7214 # avg IFQ occupancy (insn's)
ifq_rate                     1.8849 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9743 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 107123183 # cumulative RUU occupancy
RUU_fcount                  5800450 # cumulative RUU full count
ruu_occupancy               15.3563 # avg RUU occupancy (insn's)
ruu_rate                     1.8849 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.1469 # avg RUU occupant latency (cycle's)
ruu_full                     0.8315 # fraction of time (cycle's) RUU was full
LSQ_count                  32798890 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7018 # avg LSQ occupancy (insn's)
lsq_rate                     1.8849 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4944 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156971709 # total number of slip cycles
avg_sim_slip                11.9761 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:50:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265022 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6581908 # total simulation time in cycles
sim_IPC                      1.7596 # instructions per cycle
sim_CPI                      0.5683 # cycles per instruction
sim_exec_BW                  1.8634 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19567961 # cumulative IFQ occupancy
IFQ_fcount                  4063831 # cumulative IFQ full count
ifq_occupancy                2.9730 # avg IFQ occupancy (insn's)
ifq_rate                     1.8634 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5954 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6174 # fraction of time (cycle's) IFQ was full
RUU_count                  80519738 # cumulative RUU occupancy
RUU_fcount                  3461830 # cumulative RUU full count
ruu_occupancy               12.2335 # avg RUU occupancy (insn's)
ruu_rate                     1.8634 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5650 # avg RUU occupant latency (cycle's)
ruu_full                     0.5260 # fraction of time (cycle's) RUU was full
LSQ_count                  32860458 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.9925 # avg LSQ occupancy (insn's)
lsq_rate                     1.8634 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6792 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  127447305 # total number of slip cycles
avg_sim_slip                11.0044 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917910 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:50:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375753 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10851522 # total simulation time in cycles
sim_IPC                      1.2276 # instructions per cycle
sim_CPI                      0.8146 # cycles per instruction
sim_exec_BW                  1.2326 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  42207688 # cumulative IFQ occupancy
IFQ_fcount                 10401278 # cumulative IFQ full count
ifq_occupancy                3.8896 # avg IFQ occupancy (insn's)
ifq_rate                     1.2326 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.1555 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9585 # fraction of time (cycle's) IFQ was full
RUU_count                 169761993 # cumulative RUU occupancy
RUU_fcount                 10260202 # cumulative RUU full count
ruu_occupancy               15.6441 # avg RUU occupancy (insn's)
ruu_rate                     1.2326 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.6918 # avg RUU occupant latency (cycle's)
ruu_full                     0.9455 # fraction of time (cycle's) RUU was full
LSQ_count                  90794666 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3670 # avg LSQ occupancy (insn's)
lsq_rate                     1.2326 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7880 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  280270416 # total number of slip cycles
avg_sim_slip                21.0399 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:50:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12755844 # total simulation time in cycles
sim_IPC                      1.6760 # instructions per cycle
sim_CPI                      0.5966 # cycles per instruction
sim_exec_BW                  1.6816 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49818054 # cumulative IFQ occupancy
IFQ_fcount                 11835341 # cumulative IFQ full count
ifq_occupancy                3.9055 # avg IFQ occupancy (insn's)
ifq_rate                     1.6816 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3225 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 202412714 # cumulative RUU occupancy
RUU_fcount                 12615216 # cumulative RUU full count
ruu_occupancy               15.8682 # avg RUU occupancy (insn's)
ruu_rate                     1.6816 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4364 # avg RUU occupant latency (cycle's)
ruu_full                     0.9890 # fraction of time (cycle's) RUU was full
LSQ_count                  64871823 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0857 # avg LSQ occupancy (insn's)
lsq_rate                     1.6816 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0243 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  295066864 # total number of slip cycles
avg_sim_slip                13.8015 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:51:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861883 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  41186834 # total simulation time in cycles
sim_IPC                      0.6764 # instructions per cycle
sim_CPI                      1.4784 # cycles per instruction
sim_exec_BW                  0.6765 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 164605305 # cumulative IFQ occupancy
IFQ_fcount                 41151086 # cumulative IFQ full count
ifq_occupancy                3.9966 # avg IFQ occupancy (insn's)
ifq_rate                     0.6765 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.9079 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9991 # fraction of time (cycle's) IFQ was full
RUU_count                 658427351 # cumulative RUU occupancy
RUU_fcount                 41150192 # cumulative RUU full count
ruu_occupancy               15.9864 # avg RUU occupancy (insn's)
ruu_rate                     0.6765 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.6318 # avg RUU occupant latency (cycle's)
ruu_full                     0.9991 # fraction of time (cycle's) RUU was full
LSQ_count                 200033358 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8567 # avg LSQ occupancy (insn's)
lsq_rate                     0.6765 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.1795 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  894969052 # total number of slip cycles
avg_sim_slip                32.1242 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:51:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6897310 # total simulation time in cycles
sim_IPC                      1.9003 # instructions per cycle
sim_CPI                      0.5262 # cycles per instruction
sim_exec_BW                  1.9064 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25673488 # cumulative IFQ occupancy
IFQ_fcount                  6292769 # cumulative IFQ full count
ifq_occupancy                3.7222 # avg IFQ occupancy (insn's)
ifq_rate                     1.9064 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9525 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105975743 # cumulative RUU occupancy
RUU_fcount                  5728930 # cumulative RUU full count
ruu_occupancy               15.3648 # avg RUU occupancy (insn's)
ruu_rate                     1.9064 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0596 # avg RUU occupant latency (cycle's)
ruu_full                     0.8306 # fraction of time (cycle's) RUU was full
LSQ_count                  32415530 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6997 # avg LSQ occupancy (insn's)
lsq_rate                     1.9064 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4652 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  155441629 # total number of slip cycles
avg_sim_slip                11.8593 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:51:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264861 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6472966 # total simulation time in cycles
sim_IPC                      1.7892 # instructions per cycle
sim_CPI                      0.5589 # cycles per instruction
sim_exec_BW                  1.8948 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19166273 # cumulative IFQ occupancy
IFQ_fcount                  3963409 # cumulative IFQ full count
ifq_occupancy                2.9610 # avg IFQ occupancy (insn's)
ifq_rate                     1.8948 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5627 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6123 # fraction of time (cycle's) IFQ was full
RUU_count                  78912154 # cumulative RUU occupancy
RUU_fcount                  3361448 # cumulative RUU full count
ruu_occupancy               12.1910 # avg RUU occupancy (insn's)
ruu_rate                     1.8948 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4340 # avg RUU occupant latency (cycle's)
ruu_full                     0.5193 # fraction of time (cycle's) RUU was full
LSQ_count                  32497591 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0205 # avg LSQ occupancy (insn's)
lsq_rate                     1.8948 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6497 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125476854 # total number of slip cycles
avg_sim_slip                10.8342 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:51:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375433 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10100602 # total simulation time in cycles
sim_IPC                      1.3188 # instructions per cycle
sim_CPI                      0.7583 # cycles per instruction
sim_exec_BW                  1.3242 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  39318253 # cumulative IFQ occupancy
IFQ_fcount                  9678919 # cumulative IFQ full count
ifq_occupancy                3.8927 # avg IFQ occupancy (insn's)
ifq_rate                     1.3242 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.9396 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9583 # fraction of time (cycle's) IFQ was full
RUU_count                 158194881 # cumulative RUU occupancy
RUU_fcount                  9537922 # cumulative RUU full count
ruu_occupancy               15.6619 # avg RUU occupancy (insn's)
ruu_rate                     1.3242 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.8273 # avg RUU occupant latency (cycle's)
ruu_full                     0.9443 # fraction of time (cycle's) RUU was full
LSQ_count                  83911095 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3075 # avg LSQ occupancy (insn's)
lsq_rate                     1.3242 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2735 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  261820533 # total number of slip cycles
avg_sim_slip                19.6549 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:51:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12666004 # total simulation time in cycles
sim_IPC                      1.6879 # instructions per cycle
sim_CPI                      0.5924 # cycles per instruction
sim_exec_BW                  1.6935 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49486534 # cumulative IFQ occupancy
IFQ_fcount                 11752461 # cumulative IFQ full count
ifq_occupancy                3.9070 # avg IFQ occupancy (insn's)
ifq_rate                     1.6935 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3071 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201084674 # cumulative RUU occupancy
RUU_fcount                 12532336 # cumulative RUU full count
ruu_occupancy               15.8759 # avg RUU occupancy (insn's)
ruu_rate                     1.6935 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3745 # avg RUU occupant latency (cycle's)
ruu_full                     0.9894 # fraction of time (cycle's) RUU was full
LSQ_count                  64455343 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0888 # avg LSQ occupancy (insn's)
lsq_rate                     1.6935 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0049 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293322584 # total number of slip cycles
avg_sim_slip                13.7200 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:52:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861563 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  39377674 # total simulation time in cycles
sim_IPC                      0.7075 # instructions per cycle
sim_CPI                      1.4134 # cycles per instruction
sim_exec_BW                  0.7075 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 157402105 # cumulative IFQ occupancy
IFQ_fcount                 39350286 # cumulative IFQ full count
ifq_occupancy                3.9972 # avg IFQ occupancy (insn's)
ifq_rate                     0.7075 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.6494 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 629612831 # cumulative RUU occupancy
RUU_fcount                 39349472 # cumulative RUU full count
ruu_occupancy               15.9891 # avg RUU occupancy (insn's)
ruu_rate                     0.7075 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.5979 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 191327318 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8588 # avg LSQ occupancy (insn's)
lsq_rate                     0.7075 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8671 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  857448652 # total number of slip cycles
avg_sim_slip                30.7774 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:52:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6858050 # total simulation time in cycles
sim_IPC                      1.9112 # instructions per cycle
sim_CPI                      0.5232 # cycles per instruction
sim_exec_BW                  1.9173 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25530448 # cumulative IFQ occupancy
IFQ_fcount                  6257009 # cumulative IFQ full count
ifq_occupancy                3.7227 # avg IFQ occupancy (insn's)
ifq_rate                     1.9173 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9416 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105402023 # cumulative RUU occupancy
RUU_fcount                  5693170 # cumulative RUU full count
ruu_occupancy               15.3691 # avg RUU occupancy (insn's)
ruu_rate                     1.9173 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0160 # avg RUU occupant latency (cycle's)
ruu_full                     0.8301 # fraction of time (cycle's) RUU was full
LSQ_count                  32223850 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6987 # avg LSQ occupancy (insn's)
lsq_rate                     1.9173 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4507 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154676589 # total number of slip cycles
avg_sim_slip                11.8010 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:52:42 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264781 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6418519 # total simulation time in cycles
sim_IPC                      1.8044 # instructions per cycle
sim_CPI                      0.5542 # cycles per instruction
sim_exec_BW                  1.9108 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18965527 # cumulative IFQ occupancy
IFQ_fcount                  3913222 # cumulative IFQ full count
ifq_occupancy                2.9548 # avg IFQ occupancy (insn's)
ifq_rate                     1.9108 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5463 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6097 # fraction of time (cycle's) IFQ was full
RUU_count                  78108615 # cumulative RUU occupancy
RUU_fcount                  3311277 # cumulative RUU full count
ruu_occupancy               12.1693 # avg RUU occupancy (insn's)
ruu_rate                     1.9108 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3685 # avg RUU occupant latency (cycle's)
ruu_full                     0.5159 # fraction of time (cycle's) RUU was full
LSQ_count                  32316345 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0349 # avg LSQ occupancy (insn's)
lsq_rate                     1.9108 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6349 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124492056 # total number of slip cycles
avg_sim_slip                10.7492 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:52:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375276 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9725144 # total simulation time in cycles
sim_IPC                      1.3697 # instructions per cycle
sim_CPI                      0.7301 # cycles per instruction
sim_exec_BW                  1.3753 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37873541 # cumulative IFQ occupancy
IFQ_fcount                  9317741 # cumulative IFQ full count
ifq_occupancy                3.8944 # avg IFQ occupancy (insn's)
ifq_rate                     1.3753 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8316 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9581 # fraction of time (cycle's) IFQ was full
RUU_count                 152411402 # cumulative RUU occupancy
RUU_fcount                  9176782 # cumulative RUU full count
ruu_occupancy               15.6719 # avg RUU occupancy (insn's)
ruu_rate                     1.3753 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.3950 # avg RUU occupant latency (cycle's)
ruu_full                     0.9436 # fraction of time (cycle's) RUU was full
LSQ_count                  80469364 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2744 # avg LSQ occupancy (insn's)
lsq_rate                     1.3753 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0163 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  252595712 # total number of slip cycles
avg_sim_slip                18.9623 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:53:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12621084 # total simulation time in cycles
sim_IPC                      1.6939 # instructions per cycle
sim_CPI                      0.5903 # cycles per instruction
sim_exec_BW                  1.6995 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49320774 # cumulative IFQ occupancy
IFQ_fcount                 11711021 # cumulative IFQ full count
ifq_occupancy                3.9078 # avg IFQ occupancy (insn's)
ifq_rate                     1.6995 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2993 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200420654 # cumulative RUU occupancy
RUU_fcount                 12490896 # cumulative RUU full count
ruu_occupancy               15.8798 # avg RUU occupancy (insn's)
ruu_rate                     1.6995 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3436 # avg RUU occupant latency (cycle's)
ruu_full                     0.9897 # fraction of time (cycle's) RUU was full
LSQ_count                  64247103 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0905 # avg LSQ occupancy (insn's)
lsq_rate                     1.6995 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9952 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292450444 # total number of slip cycles
avg_sim_slip                13.6792 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:53:15 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861403 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38473094 # total simulation time in cycles
sim_IPC                      0.7241 # instructions per cycle
sim_CPI                      1.3810 # cycles per instruction
sim_exec_BW                  0.7242 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 153800505 # cumulative IFQ occupancy
IFQ_fcount                 38449886 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7242 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5202 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 615205571 # cumulative RUU occupancy
RUU_fcount                 38449112 # cumulative RUU full count
ruu_occupancy               15.9905 # avg RUU occupancy (insn's)
ruu_rate                     0.7242 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.0809 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 186974298 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8599 # avg LSQ occupancy (insn's)
lsq_rate                     0.7242 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7109 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  838688452 # total number of slip cycles
avg_sim_slip                30.1040 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:53:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6838420 # total simulation time in cycles
sim_IPC                      1.9167 # instructions per cycle
sim_CPI                      0.5217 # cycles per instruction
sim_exec_BW                  1.9228 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25458928 # cumulative IFQ occupancy
IFQ_fcount                  6239129 # cumulative IFQ full count
ifq_occupancy                3.7229 # avg IFQ occupancy (insn's)
ifq_rate                     1.9228 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9362 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105115163 # cumulative RUU occupancy
RUU_fcount                  5675290 # cumulative RUU full count
ruu_occupancy               15.3713 # avg RUU occupancy (insn's)
ruu_rate                     1.9228 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9942 # avg RUU occupant latency (cycle's)
ruu_full                     0.8299 # fraction of time (cycle's) RUU was full
LSQ_count                  32128010 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6982 # avg LSQ occupancy (insn's)
lsq_rate                     1.9228 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4434 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154294069 # total number of slip cycles
avg_sim_slip                11.7718 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:53:47 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264741 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6391299 # total simulation time in cycles
sim_IPC                      1.8121 # instructions per cycle
sim_CPI                      0.5519 # cycles per instruction
sim_exec_BW                  1.9190 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18865167 # cumulative IFQ occupancy
IFQ_fcount                  3888132 # cumulative IFQ full count
ifq_occupancy                2.9517 # avg IFQ occupancy (insn's)
ifq_rate                     1.9190 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5382 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6083 # fraction of time (cycle's) IFQ was full
RUU_count                  77706953 # cumulative RUU occupancy
RUU_fcount                  3286197 # cumulative RUU full count
ruu_occupancy               12.1582 # avg RUU occupancy (insn's)
ruu_rate                     1.9190 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3358 # avg RUU occupant latency (cycle's)
ruu_full                     0.5142 # fraction of time (cycle's) RUU was full
LSQ_count                  32225767 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0421 # avg LSQ occupancy (insn's)
lsq_rate                     1.9190 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6275 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123999816 # total number of slip cycles
avg_sim_slip                10.7067 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:53:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375196 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9537424 # total simulation time in cycles
sim_IPC                      1.3967 # instructions per cycle
sim_CPI                      0.7160 # cycles per instruction
sim_exec_BW                  1.4024 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37151223 # cumulative IFQ occupancy
IFQ_fcount                  9137161 # cumulative IFQ full count
ifq_occupancy                3.8953 # avg IFQ occupancy (insn's)
ifq_rate                     1.4024 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7776 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 149519727 # cumulative RUU occupancy
RUU_fcount                  8996222 # cumulative RUU full count
ruu_occupancy               15.6772 # avg RUU occupancy (insn's)
ruu_rate                     1.4024 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.1789 # avg RUU occupant latency (cycle's)
ruu_full                     0.9433 # fraction of time (cycle's) RUU was full
LSQ_count                  78748518 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2568 # avg LSQ occupancy (insn's)
lsq_rate                     1.4024 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8877 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  247983391 # total number of slip cycles
avg_sim_slip                18.6161 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:54:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12598624 # total simulation time in cycles
sim_IPC                      1.6970 # instructions per cycle
sim_CPI                      0.5893 # cycles per instruction
sim_exec_BW                  1.7026 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49237894 # cumulative IFQ occupancy
IFQ_fcount                 11690301 # cumulative IFQ full count
ifq_occupancy                3.9082 # avg IFQ occupancy (insn's)
ifq_rate                     1.7026 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2955 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200088644 # cumulative RUU occupancy
RUU_fcount                 12470176 # cumulative RUU full count
ruu_occupancy               15.8818 # avg RUU occupancy (insn's)
ruu_rate                     1.7026 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3281 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  64142983 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0913 # avg LSQ occupancy (insn's)
lsq_rate                     1.7026 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9903 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292014374 # total number of slip cycles
avg_sim_slip                13.6588 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 91 10 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:54:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         91 10 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27861323 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38020804 # total simulation time in cycles
sim_IPC                      0.7327 # instructions per cycle
sim_CPI                      1.3647 # cycles per instruction
sim_exec_BW                  0.7328 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151999705 # cumulative IFQ occupancy
IFQ_fcount                 37999686 # cumulative IFQ full count
ifq_occupancy                3.9978 # avg IFQ occupancy (insn's)
ifq_rate                     0.7328 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4556 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 608001941 # cumulative RUU occupancy
RUU_fcount                 37998932 # cumulative RUU full count
ruu_occupancy               15.9913 # avg RUU occupancy (insn's)
ruu_rate                     0.7328 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.8224 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 184797788 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8604 # avg LSQ occupancy (insn's)
lsq_rate                     0.7328 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6328 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  829308352 # total number of slip cycles
avg_sim_slip                29.7673 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:54:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6926755 # total simulation time in cycles
sim_IPC                      1.8922 # instructions per cycle
sim_CPI                      0.5285 # cycles per instruction
sim_exec_BW                  1.8983 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25780768 # cumulative IFQ occupancy
IFQ_fcount                  6319589 # cumulative IFQ full count
ifq_occupancy                3.7219 # avg IFQ occupancy (insn's)
ifq_rate                     1.8983 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9607 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106406033 # cumulative RUU occupancy
RUU_fcount                  5755750 # cumulative RUU full count
ruu_occupancy               15.3616 # avg RUU occupancy (insn's)
ruu_rate                     1.8983 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0923 # avg RUU occupant latency (cycle's)
ruu_full                     0.8309 # fraction of time (cycle's) RUU was full
LSQ_count                  32559290 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7005 # avg LSQ occupancy (insn's)
lsq_rate                     1.8983 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4762 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156015409 # total number of slip cycles
avg_sim_slip                11.9031 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:54:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264921 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6513811 # total simulation time in cycles
sim_IPC                      1.7780 # instructions per cycle
sim_CPI                      0.5624 # cycles per instruction
sim_exec_BW                  1.8829 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19316873 # cumulative IFQ occupancy
IFQ_fcount                  4001059 # cumulative IFQ full count
ifq_occupancy                2.9655 # avg IFQ occupancy (insn's)
ifq_rate                     1.8829 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5750 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6142 # fraction of time (cycle's) IFQ was full
RUU_count                  79514869 # cumulative RUU occupancy
RUU_fcount                  3399083 # cumulative RUU full count
ruu_occupancy               12.2071 # avg RUU occupancy (insn's)
ruu_rate                     1.8829 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4831 # avg RUU occupant latency (cycle's)
ruu_full                     0.5218 # fraction of time (cycle's) RUU was full
LSQ_count                  32633551 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0099 # avg LSQ occupancy (insn's)
lsq_rate                     1.8829 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6607 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126215529 # total number of slip cycles
avg_sim_slip                10.8980 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:55:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375553 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10382197 # total simulation time in cycles
sim_IPC                      1.2831 # instructions per cycle
sim_CPI                      0.7794 # cycles per instruction
sim_exec_BW                  1.2883 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40401790 # cumulative IFQ occupancy
IFQ_fcount                  9949803 # cumulative IFQ full count
ifq_occupancy                3.8914 # avg IFQ occupancy (insn's)
ifq_rate                     1.2883 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0206 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 162532508 # cumulative RUU occupancy
RUU_fcount                  9808777 # cumulative RUU full count
ruu_occupancy               15.6549 # avg RUU occupancy (insn's)
ruu_rate                     1.2883 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.1515 # avg RUU occupant latency (cycle's)
ruu_full                     0.9448 # fraction of time (cycle's) RUU was full
LSQ_count                  86492418 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3308 # avg LSQ occupancy (insn's)
lsq_rate                     1.2883 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4665 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  268739183 # total number of slip cycles
avg_sim_slip                20.1742 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:55:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12699694 # total simulation time in cycles
sim_IPC                      1.6834 # instructions per cycle
sim_CPI                      0.5940 # cycles per instruction
sim_exec_BW                  1.6890 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49610854 # cumulative IFQ occupancy
IFQ_fcount                 11783541 # cumulative IFQ full count
ifq_occupancy                3.9065 # avg IFQ occupancy (insn's)
ifq_rate                     1.6890 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3128 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201582689 # cumulative RUU occupancy
RUU_fcount                 12563416 # cumulative RUU full count
ruu_occupancy               15.8730 # avg RUU occupancy (insn's)
ruu_rate                     1.6890 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3977 # avg RUU occupant latency (cycle's)
ruu_full                     0.9893 # fraction of time (cycle's) RUU was full
LSQ_count                  64611523 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0876 # avg LSQ occupancy (insn's)
lsq_rate                     1.6890 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0122 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293976689 # total number of slip cycles
avg_sim_slip                13.7506 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:55:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 26 # total simulation time in seconds
sim_inst_rate          1071526.6154 # simulation speed (in insts/sec)
sim_total_insn             27861683 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40056109 # total simulation time in cycles
sim_IPC                      0.6955 # instructions per cycle
sim_CPI                      1.4378 # cycles per instruction
sim_exec_BW                  0.6956 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 160103305 # cumulative IFQ occupancy
IFQ_fcount                 40025586 # cumulative IFQ full count
ifq_occupancy                3.9970 # avg IFQ occupancy (insn's)
ifq_rate                     0.6956 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7464 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 640418276 # cumulative RUU occupancy
RUU_fcount                 40024742 # cumulative RUU full count
ruu_occupancy               15.9880 # avg RUU occupancy (insn's)
ruu_rate                     0.6956 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.9856 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 194592083 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8580 # avg LSQ occupancy (insn's)
lsq_rate                     0.6956 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.9842 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  871518802 # total number of slip cycles
avg_sim_slip                31.2824 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:55:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6863939 # total simulation time in cycles
sim_IPC                      1.9096 # instructions per cycle
sim_CPI                      0.5237 # cycles per instruction
sim_exec_BW                  1.9157 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25551904 # cumulative IFQ occupancy
IFQ_fcount                  6262373 # cumulative IFQ full count
ifq_occupancy                3.7226 # avg IFQ occupancy (insn's)
ifq_rate                     1.9157 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9433 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105488081 # cumulative RUU occupancy
RUU_fcount                  5698534 # cumulative RUU full count
ruu_occupancy               15.3684 # avg RUU occupancy (insn's)
ruu_rate                     1.9157 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0225 # avg RUU occupant latency (cycle's)
ruu_full                     0.8302 # fraction of time (cycle's) RUU was full
LSQ_count                  32252602 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6988 # avg LSQ occupancy (insn's)
lsq_rate                     1.9157 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4529 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154791345 # total number of slip cycles
avg_sim_slip                11.8097 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:56:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264793 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6426685 # total simulation time in cycles
sim_IPC                      1.8021 # instructions per cycle
sim_CPI                      0.5549 # cycles per instruction
sim_exec_BW                  1.9084 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18995635 # cumulative IFQ occupancy
IFQ_fcount                  3920749 # cumulative IFQ full count
ifq_occupancy                2.9557 # avg IFQ occupancy (insn's)
ifq_rate                     1.9084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5488 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6101 # fraction of time (cycle's) IFQ was full
RUU_count                  78229140 # cumulative RUU occupancy
RUU_fcount                  3318801 # cumulative RUU full count
ruu_occupancy               12.1725 # avg RUU occupancy (insn's)
ruu_rate                     1.9084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3783 # avg RUU occupant latency (cycle's)
ruu_full                     0.5164 # fraction of time (cycle's) RUU was full
LSQ_count                  32343534 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0327 # avg LSQ occupancy (insn's)
lsq_rate                     1.9084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6371 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124639770 # total number of slip cycles
avg_sim_slip                10.7620 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:56:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375297 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9781461 # total simulation time in cycles
sim_IPC                      1.3619 # instructions per cycle
sim_CPI                      0.7343 # cycles per instruction
sim_exec_BW                  1.3674 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  38090241 # cumulative IFQ occupancy
IFQ_fcount                  9371916 # cumulative IFQ full count
ifq_occupancy                3.8941 # avg IFQ occupancy (insn's)
ifq_rate                     1.3674 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8478 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9581 # fraction of time (cycle's) IFQ was full
RUU_count                 153278906 # cumulative RUU occupancy
RUU_fcount                  9230953 # cumulative RUU full count
ruu_occupancy               15.6703 # avg RUU occupancy (insn's)
ruu_rate                     1.3674 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.4599 # avg RUU occupant latency (cycle's)
ruu_full                     0.9437 # fraction of time (cycle's) RUU was full
LSQ_count                  80985615 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2795 # avg LSQ occupancy (insn's)
lsq_rate                     1.3674 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0549 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  253979418 # total number of slip cycles
avg_sim_slip                19.0662 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:56:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12627822 # total simulation time in cycles
sim_IPC                      1.6930 # instructions per cycle
sim_CPI                      0.5907 # cycles per instruction
sim_exec_BW                  1.6986 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49345638 # cumulative IFQ occupancy
IFQ_fcount                 11717237 # cumulative IFQ full count
ifq_occupancy                3.9077 # avg IFQ occupancy (insn's)
ifq_rate                     1.6986 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3005 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200520257 # cumulative RUU occupancy
RUU_fcount                 12497112 # cumulative RUU full count
ruu_occupancy               15.8792 # avg RUU occupancy (insn's)
ruu_rate                     1.6986 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3482 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  64278339 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0902 # avg LSQ occupancy (insn's)
lsq_rate                     1.6986 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9966 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292581265 # total number of slip cycles
avg_sim_slip                13.6853 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:56:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861427 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38608781 # total simulation time in cycles
sim_IPC                      0.7216 # instructions per cycle
sim_CPI                      1.3858 # cycles per instruction
sim_exec_BW                  0.7216 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154340745 # cumulative IFQ occupancy
IFQ_fcount                 38584946 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7216 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5396 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 617366660 # cumulative RUU occupancy
RUU_fcount                 38584166 # cumulative RUU full count
ruu_occupancy               15.9903 # avg RUU occupancy (insn's)
ruu_rate                     0.7216 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1585 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 187627251 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8597 # avg LSQ occupancy (insn's)
lsq_rate                     0.7216 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7343 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  841502482 # total number of slip cycles
avg_sim_slip                30.2050 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:56:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6832531 # total simulation time in cycles
sim_IPC                      1.9183 # instructions per cycle
sim_CPI                      0.5213 # cycles per instruction
sim_exec_BW                  1.9245 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25437472 # cumulative IFQ occupancy
IFQ_fcount                  6233765 # cumulative IFQ full count
ifq_occupancy                3.7230 # avg IFQ occupancy (insn's)
ifq_rate                     1.9245 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9346 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105029105 # cumulative RUU occupancy
RUU_fcount                  5669926 # cumulative RUU full count
ruu_occupancy               15.3719 # avg RUU occupancy (insn's)
ruu_rate                     1.9245 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9876 # avg RUU occupant latency (cycle's)
ruu_full                     0.8298 # fraction of time (cycle's) RUU was full
LSQ_count                  32099258 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6980 # avg LSQ occupancy (insn's)
lsq_rate                     1.9245 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4412 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154179313 # total number of slip cycles
avg_sim_slip                11.7630 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:57:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264729 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6383133 # total simulation time in cycles
sim_IPC                      1.8144 # instructions per cycle
sim_CPI                      0.5511 # cycles per instruction
sim_exec_BW                  1.9214 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18835059 # cumulative IFQ occupancy
IFQ_fcount                  3880605 # cumulative IFQ full count
ifq_occupancy                2.9508 # avg IFQ occupancy (insn's)
ifq_rate                     1.9214 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5357 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6079 # fraction of time (cycle's) IFQ was full
RUU_count                  77586458 # cumulative RUU occupancy
RUU_fcount                  3278673 # cumulative RUU full count
ruu_occupancy               12.1549 # avg RUU occupancy (insn's)
ruu_rate                     1.9214 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3260 # avg RUU occupant latency (cycle's)
ruu_full                     0.5136 # fraction of time (cycle's) RUU was full
LSQ_count                  32198596 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0443 # avg LSQ occupancy (insn's)
lsq_rate                     1.9214 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6253 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123852150 # total number of slip cycles
avg_sim_slip                10.6940 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:57:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375172 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9481111 # total simulation time in cycles
sim_IPC                      1.4050 # instructions per cycle
sim_CPI                      0.7117 # cycles per instruction
sim_exec_BW                  1.4107 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36934539 # cumulative IFQ occupancy
IFQ_fcount                  9082990 # cumulative IFQ full count
ifq_occupancy                3.8956 # avg IFQ occupancy (insn's)
ifq_rate                     1.4107 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7614 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 148652253 # cumulative RUU occupancy
RUU_fcount                  8942057 # cumulative RUU full count
ruu_occupancy               15.6788 # avg RUU occupancy (insn's)
ruu_rate                     1.4107 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.1140 # avg RUU occupant latency (cycle's)
ruu_full                     0.9431 # fraction of time (cycle's) RUU was full
LSQ_count                  78232269 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2514 # avg LSQ occupancy (insn's)
lsq_rate                     1.4107 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8491 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  246599728 # total number of slip cycles
avg_sim_slip                18.5122 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:57:25 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12591886 # total simulation time in cycles
sim_IPC                      1.6979 # instructions per cycle
sim_CPI                      0.5890 # cycles per instruction
sim_exec_BW                  1.7035 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49213030 # cumulative IFQ occupancy
IFQ_fcount                 11684085 # cumulative IFQ full count
ifq_occupancy                3.9083 # avg IFQ occupancy (insn's)
ifq_rate                     1.7035 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2943 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199989041 # cumulative RUU occupancy
RUU_fcount                 12463960 # cumulative RUU full count
ruu_occupancy               15.8824 # avg RUU occupancy (insn's)
ruu_rate                     1.7035 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3234 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  64111747 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0915 # avg LSQ occupancy (insn's)
lsq_rate                     1.7035 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9889 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291883553 # total number of slip cycles
avg_sim_slip                13.6527 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:57:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861299 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37885117 # total simulation time in cycles
sim_IPC                      0.7354 # instructions per cycle
sim_CPI                      1.3599 # cycles per instruction
sim_exec_BW                  0.7354 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151459465 # cumulative IFQ occupancy
IFQ_fcount                 37864626 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7354 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4362 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 605840852 # cumulative RUU occupancy
RUU_fcount                 37863878 # cumulative RUU full count
ruu_occupancy               15.9915 # avg RUU occupancy (insn's)
ruu_rate                     0.7354 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.7449 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 184144835 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8606 # avg LSQ occupancy (insn's)
lsq_rate                     0.7354 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6093 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  826494322 # total number of slip cycles
avg_sim_slip                29.6663 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:58:02 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6816827 # total simulation time in cycles
sim_IPC                      1.9228 # instructions per cycle
sim_CPI                      0.5201 # cycles per instruction
sim_exec_BW                  1.9289 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25380256 # cumulative IFQ occupancy
IFQ_fcount                  6219461 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9289 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9302 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104799617 # cumulative RUU occupancy
RUU_fcount                  5655622 # cumulative RUU full count
ruu_occupancy               15.3737 # avg RUU occupancy (insn's)
ruu_rate                     1.9289 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9702 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32022586 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6976 # avg LSQ occupancy (insn's)
lsq_rate                     1.9289 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4354 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153873297 # total number of slip cycles
avg_sim_slip                11.7397 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:58:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264697 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6361357 # total simulation time in cycles
sim_IPC                      1.8206 # instructions per cycle
sim_CPI                      0.5493 # cycles per instruction
sim_exec_BW                  1.9280 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18754771 # cumulative IFQ occupancy
IFQ_fcount                  3860533 # cumulative IFQ full count
ifq_occupancy                2.9482 # avg IFQ occupancy (insn's)
ifq_rate                     1.9280 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5292 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6069 # fraction of time (cycle's) IFQ was full
RUU_count                  77265138 # cumulative RUU occupancy
RUU_fcount                  3258609 # cumulative RUU full count
ruu_occupancy               12.1460 # avg RUU occupancy (insn's)
ruu_rate                     1.9280 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2998 # avg RUU occupant latency (cycle's)
ruu_full                     0.5123 # fraction of time (cycle's) RUU was full
LSQ_count                  32126140 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0502 # avg LSQ occupancy (insn's)
lsq_rate                     1.9280 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6194 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123458374 # total number of slip cycles
avg_sim_slip                10.6600 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:58:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375108 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9330943 # total simulation time in cycles
sim_IPC                      1.4276 # instructions per cycle
sim_CPI                      0.7005 # cycles per instruction
sim_exec_BW                  1.4334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36356715 # cumulative IFQ occupancy
IFQ_fcount                  8938534 # cumulative IFQ full count
ifq_occupancy                3.8964 # avg IFQ occupancy (insn's)
ifq_rate                     1.4334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7182 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 146339109 # cumulative RUU occupancy
RUU_fcount                  8797617 # cumulative RUU full count
ruu_occupancy               15.6832 # avg RUU occupancy (insn's)
ruu_rate                     1.4334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.9412 # avg RUU occupant latency (cycle's)
ruu_full                     0.9428 # fraction of time (cycle's) RUU was full
LSQ_count                  76855663 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2366 # avg LSQ occupancy (insn's)
lsq_rate                     1.4334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7462 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  242910138 # total number of slip cycles
avg_sim_slip                18.2353 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:58:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12573918 # total simulation time in cycles
sim_IPC                      1.7003 # instructions per cycle
sim_CPI                      0.5881 # cycles per instruction
sim_exec_BW                  1.7059 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49146726 # cumulative IFQ occupancy
IFQ_fcount                 11667509 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7059 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2912 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199723433 # cumulative RUU occupancy
RUU_fcount                 12447384 # cumulative RUU full count
ruu_occupancy               15.8839 # avg RUU occupancy (insn's)
ruu_rate                     1.7059 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3111 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64028451 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0922 # avg LSQ occupancy (insn's)
lsq_rate                     1.7059 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9850 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291534697 # total number of slip cycles
avg_sim_slip                13.6363 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 80 8 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:58:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         80 8 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861235 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37523285 # total simulation time in cycles
sim_IPC                      0.7425 # instructions per cycle
sim_CPI                      1.3469 # cycles per instruction
sim_exec_BW                  0.7425 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150018825 # cumulative IFQ occupancy
IFQ_fcount                 37504466 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7425 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3845 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 600077948 # cumulative RUU occupancy
RUU_fcount                 37503734 # cumulative RUU full count
ruu_occupancy               15.9921 # avg RUU occupancy (insn's)
ruu_rate                     0.7425 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5381 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182403627 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8611 # avg LSQ occupancy (insn's)
lsq_rate                     0.7425 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5469 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  818990242 # total number of slip cycles
avg_sim_slip                29.3970 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 10:59:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6883569 # total simulation time in cycles
sim_IPC                      1.9041 # instructions per cycle
sim_CPI                      0.5252 # cycles per instruction
sim_exec_BW                  1.9102 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25623424 # cumulative IFQ occupancy
IFQ_fcount                  6280253 # cumulative IFQ full count
ifq_occupancy                3.7224 # avg IFQ occupancy (insn's)
ifq_rate                     1.9102 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9487 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105774941 # cumulative RUU occupancy
RUU_fcount                  5716414 # cumulative RUU full count
ruu_occupancy               15.3663 # avg RUU occupancy (insn's)
ruu_rate                     1.9102 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0443 # avg RUU occupant latency (cycle's)
ruu_full                     0.8304 # fraction of time (cycle's) RUU was full
LSQ_count                  32348442 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6994 # avg LSQ occupancy (insn's)
lsq_rate                     1.9102 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4601 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  155173865 # total number of slip cycles
avg_sim_slip                11.8389 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 10:59:15 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264833 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6453905 # total simulation time in cycles
sim_IPC                      1.7945 # instructions per cycle
sim_CPI                      0.5573 # cycles per instruction
sim_exec_BW                  1.9004 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19095993 # cumulative IFQ occupancy
IFQ_fcount                  3945839 # cumulative IFQ full count
ifq_occupancy                2.9588 # avg IFQ occupancy (insn's)
ifq_rate                     1.9004 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5570 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6114 # fraction of time (cycle's) IFQ was full
RUU_count                  78630887 # cumulative RUU occupancy
RUU_fcount                  3343885 # cumulative RUU full count
ruu_occupancy               12.1835 # avg RUU occupancy (insn's)
ruu_rate                     1.9004 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4111 # avg RUU occupant latency (cycle's)
ruu_full                     0.5181 # fraction of time (cycle's) RUU was full
LSQ_count                  32434143 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0255 # avg LSQ occupancy (insn's)
lsq_rate                     1.9004 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6445 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125132139 # total number of slip cycles
avg_sim_slip                10.8045 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 10:59:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375377 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9969191 # total simulation time in cycles
sim_IPC                      1.3362 # instructions per cycle
sim_CPI                      0.7484 # cycles per instruction
sim_exec_BW                  1.3417 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  38812601 # cumulative IFQ occupancy
IFQ_fcount                  9552506 # cumulative IFQ full count
ifq_occupancy                3.8933 # avg IFQ occupancy (insn's)
ifq_rate                     1.3417 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.9018 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9582 # fraction of time (cycle's) IFQ was full
RUU_count                 156170656 # cumulative RUU occupancy
RUU_fcount                  9411523 # cumulative RUU full count
ruu_occupancy               15.6653 # avg RUU occupancy (insn's)
ruu_rate                     1.3417 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.6760 # avg RUU occupant latency (cycle's)
ruu_full                     0.9441 # fraction of time (cycle's) RUU was full
LSQ_count                  82706479 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2962 # avg LSQ occupancy (insn's)
lsq_rate                     1.3417 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.1835 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  258591832 # total number of slip cycles
avg_sim_slip                19.4125 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 10:59:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12650282 # total simulation time in cycles
sim_IPC                      1.6900 # instructions per cycle
sim_CPI                      0.5917 # cycles per instruction
sim_exec_BW                  1.6956 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49428518 # cumulative IFQ occupancy
IFQ_fcount                 11737957 # cumulative IFQ full count
ifq_occupancy                3.9073 # avg IFQ occupancy (insn's)
ifq_rate                     1.6956 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3043 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200852267 # cumulative RUU occupancy
RUU_fcount                 12517832 # cumulative RUU full count
ruu_occupancy               15.8773 # avg RUU occupancy (insn's)
ruu_rate                     1.6956 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3637 # avg RUU occupant latency (cycle's)
ruu_full                     0.9895 # fraction of time (cycle's) RUU was full
LSQ_count                  64382459 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0894 # avg LSQ occupancy (insn's)
lsq_rate                     1.6956 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0015 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293017335 # total number of slip cycles
avg_sim_slip                13.7057 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 10:59:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861507 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  39061071 # total simulation time in cycles
sim_IPC                      0.7132 # instructions per cycle
sim_CPI                      1.4021 # cycles per instruction
sim_exec_BW                  0.7133 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 156141545 # cumulative IFQ occupancy
IFQ_fcount                 39035146 # cumulative IFQ full count
ifq_occupancy                3.9974 # avg IFQ occupancy (insn's)
ifq_rate                     0.7133 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.6042 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 624570290 # cumulative RUU occupancy
RUU_fcount                 39034346 # cumulative RUU full count
ruu_occupancy               15.9896 # avg RUU occupancy (insn's)
ruu_rate                     0.7133 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.4170 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 189803761 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8592 # avg LSQ occupancy (insn's)
lsq_rate                     0.7133 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8124 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  850882582 # total number of slip cycles
avg_sim_slip                30.5417 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:00:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6836457 # total simulation time in cycles
sim_IPC                      1.9172 # instructions per cycle
sim_CPI                      0.5216 # cycles per instruction
sim_exec_BW                  1.9234 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25451776 # cumulative IFQ occupancy
IFQ_fcount                  6237341 # cumulative IFQ full count
ifq_occupancy                3.7229 # avg IFQ occupancy (insn's)
ifq_rate                     1.9234 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9356 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105086477 # cumulative RUU occupancy
RUU_fcount                  5673502 # cumulative RUU full count
ruu_occupancy               15.3715 # avg RUU occupancy (insn's)
ruu_rate                     1.9234 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9920 # avg RUU occupant latency (cycle's)
ruu_full                     0.8299 # fraction of time (cycle's) RUU was full
LSQ_count                  32118426 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6981 # avg LSQ occupancy (insn's)
lsq_rate                     1.9234 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4427 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154255817 # total number of slip cycles
avg_sim_slip                11.7689 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:00:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264737 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6388577 # total simulation time in cycles
sim_IPC                      1.8128 # instructions per cycle
sim_CPI                      0.5516 # cycles per instruction
sim_exec_BW                  1.9198 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18855131 # cumulative IFQ occupancy
IFQ_fcount                  3885623 # cumulative IFQ full count
ifq_occupancy                2.9514 # avg IFQ occupancy (insn's)
ifq_rate                     1.9198 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5373 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6082 # fraction of time (cycle's) IFQ was full
RUU_count                  77666788 # cumulative RUU occupancy
RUU_fcount                  3283689 # cumulative RUU full count
ruu_occupancy               12.1571 # avg RUU occupancy (insn's)
ruu_rate                     1.9198 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3325 # avg RUU occupant latency (cycle's)
ruu_full                     0.5140 # fraction of time (cycle's) RUU was full
LSQ_count                  32216710 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0429 # avg LSQ occupancy (insn's)
lsq_rate                     1.9198 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6268 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123950594 # total number of slip cycles
avg_sim_slip                10.7025 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:00:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375188 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9518653 # total simulation time in cycles
sim_IPC                      1.3995 # instructions per cycle
sim_CPI                      0.7146 # cycles per instruction
sim_exec_BW                  1.4052 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37078995 # cumulative IFQ occupancy
IFQ_fcount                  9119104 # cumulative IFQ full count
ifq_occupancy                3.8954 # avg IFQ occupancy (insn's)
ifq_rate                     1.4052 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7722 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 149230569 # cumulative RUU occupancy
RUU_fcount                  8978167 # cumulative RUU full count
ruu_occupancy               15.6777 # avg RUU occupancy (insn's)
ruu_rate                     1.4052 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.1573 # avg RUU occupant latency (cycle's)
ruu_full                     0.9432 # fraction of time (cycle's) RUU was full
LSQ_count                  78576435 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2550 # avg LSQ occupancy (insn's)
lsq_rate                     1.4052 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8748 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  247522170 # total number of slip cycles
avg_sim_slip                18.5815 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:00:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12596378 # total simulation time in cycles
sim_IPC                      1.6973 # instructions per cycle
sim_CPI                      0.5892 # cycles per instruction
sim_exec_BW                  1.7029 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49229606 # cumulative IFQ occupancy
IFQ_fcount                 11688229 # cumulative IFQ full count
ifq_occupancy                3.9082 # avg IFQ occupancy (insn's)
ifq_rate                     1.7029 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2951 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200055443 # cumulative RUU occupancy
RUU_fcount                 12468104 # cumulative RUU full count
ruu_occupancy               15.8820 # avg RUU occupancy (insn's)
ruu_rate                     1.7029 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3265 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  64132571 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0914 # avg LSQ occupancy (insn's)
lsq_rate                     1.7029 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9898 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291970767 # total number of slip cycles
avg_sim_slip                13.6567 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:00:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861315 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37975575 # total simulation time in cycles
sim_IPC                      0.7336 # instructions per cycle
sim_CPI                      1.3631 # cycles per instruction
sim_exec_BW                  0.7337 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151819625 # cumulative IFQ occupancy
IFQ_fcount                 37954666 # cumulative IFQ full count
ifq_occupancy                3.9978 # avg IFQ occupancy (insn's)
ifq_rate                     0.7337 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4491 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 607281578 # cumulative RUU occupancy
RUU_fcount                 37953914 # cumulative RUU full count
ruu_occupancy               15.9914 # avg RUU occupancy (insn's)
ruu_rate                     0.7337 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.7966 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 184580137 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8605 # avg LSQ occupancy (insn's)
lsq_rate                     0.7337 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6250 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  828370342 # total number of slip cycles
avg_sim_slip                29.7337 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:01:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6812901 # total simulation time in cycles
sim_IPC                      1.9239 # instructions per cycle
sim_CPI                      0.5198 # cycles per instruction
sim_exec_BW                  1.9300 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25365952 # cumulative IFQ occupancy
IFQ_fcount                  6215885 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9300 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9291 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104742245 # cumulative RUU occupancy
RUU_fcount                  5652046 # cumulative RUU full count
ruu_occupancy               15.3741 # avg RUU occupancy (insn's)
ruu_rate                     1.9300 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9658 # avg RUU occupant latency (cycle's)
ruu_full                     0.8296 # fraction of time (cycle's) RUU was full
LSQ_count                  32003418 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6975 # avg LSQ occupancy (insn's)
lsq_rate                     1.9300 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4339 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153796793 # total number of slip cycles
avg_sim_slip                11.7338 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:01:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264689 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6355913 # total simulation time in cycles
sim_IPC                      1.8222 # instructions per cycle
sim_CPI                      0.5488 # cycles per instruction
sim_exec_BW                  1.9297 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18734699 # cumulative IFQ occupancy
IFQ_fcount                  3855515 # cumulative IFQ full count
ifq_occupancy                2.9476 # avg IFQ occupancy (insn's)
ifq_rate                     1.9297 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5275 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6066 # fraction of time (cycle's) IFQ was full
RUU_count                  77184808 # cumulative RUU occupancy
RUU_fcount                  3253593 # cumulative RUU full count
ruu_occupancy               12.1438 # avg RUU occupancy (insn's)
ruu_rate                     1.9297 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2933 # avg RUU occupant latency (cycle's)
ruu_full                     0.5119 # fraction of time (cycle's) RUU was full
LSQ_count                  32108026 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0517 # avg LSQ occupancy (insn's)
lsq_rate                     1.9297 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6179 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123359930 # total number of slip cycles
avg_sim_slip                10.6515 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:01:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375092 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9293401 # total simulation time in cycles
sim_IPC                      1.4334 # instructions per cycle
sim_CPI                      0.6977 # cycles per instruction
sim_exec_BW                  1.4392 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36212259 # cumulative IFQ occupancy
IFQ_fcount                  8902420 # cumulative IFQ full count
ifq_occupancy                3.8966 # avg IFQ occupancy (insn's)
ifq_rate                     1.4392 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7074 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 145760823 # cumulative RUU occupancy
RUU_fcount                  8761507 # cumulative RUU full count
ruu_occupancy               15.6843 # avg RUU occupancy (insn's)
ruu_rate                     1.4392 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8979 # avg RUU occupant latency (cycle's)
ruu_full                     0.9428 # fraction of time (cycle's) RUU was full
LSQ_count                  76511511 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2329 # avg LSQ occupancy (insn's)
lsq_rate                     1.4392 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7204 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  241987740 # total number of slip cycles
avg_sim_slip                18.1660 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:01:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12569426 # total simulation time in cycles
sim_IPC                      1.7009 # instructions per cycle
sim_CPI                      0.5879 # cycles per instruction
sim_exec_BW                  1.7065 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49130150 # cumulative IFQ occupancy
IFQ_fcount                 11663365 # cumulative IFQ full count
ifq_occupancy                3.9087 # avg IFQ occupancy (insn's)
ifq_rate                     1.7065 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2904 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199657031 # cumulative RUU occupancy
RUU_fcount                 12443240 # cumulative RUU full count
ruu_occupancy               15.8843 # avg RUU occupancy (insn's)
ruu_rate                     1.7065 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3080 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  64007627 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0923 # avg LSQ occupancy (insn's)
lsq_rate                     1.7065 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9840 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291447483 # total number of slip cycles
avg_sim_slip                13.6323 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:01:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861219 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37432827 # total simulation time in cycles
sim_IPC                      0.7443 # instructions per cycle
sim_CPI                      1.3436 # cycles per instruction
sim_exec_BW                  0.7443 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149658665 # cumulative IFQ occupancy
IFQ_fcount                 37414426 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7443 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3716 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 598637222 # cumulative RUU occupancy
RUU_fcount                 37413698 # cumulative RUU full count
ruu_occupancy               15.9923 # avg RUU occupancy (insn's)
ruu_rate                     0.7443 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4864 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181968325 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8612 # avg LSQ occupancy (insn's)
lsq_rate                     0.7443 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5312 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  817114222 # total number of slip cycles
avg_sim_slip                29.3296 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:02:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6801123 # total simulation time in cycles
sim_IPC                      1.9272 # instructions per cycle
sim_CPI                      0.5189 # cycles per instruction
sim_exec_BW                  1.9334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25323040 # cumulative IFQ occupancy
IFQ_fcount                  6205157 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104570129 # cumulative RUU occupancy
RUU_fcount                  5641318 # cumulative RUU full count
ruu_occupancy               15.3754 # avg RUU occupancy (insn's)
ruu_rate                     1.9334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9527 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31945914 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6972 # avg LSQ occupancy (insn's)
lsq_rate                     1.9334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4295 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153567281 # total number of slip cycles
avg_sim_slip                11.7163 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:02:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264665 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6339581 # total simulation time in cycles
sim_IPC                      1.8269 # instructions per cycle
sim_CPI                      0.5474 # cycles per instruction
sim_exec_BW                  1.9346 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18674483 # cumulative IFQ occupancy
IFQ_fcount                  3840461 # cumulative IFQ full count
ifq_occupancy                2.9457 # avg IFQ occupancy (insn's)
ifq_rate                     1.9346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5226 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6058 # fraction of time (cycle's) IFQ was full
RUU_count                  76943818 # cumulative RUU occupancy
RUU_fcount                  3238545 # cumulative RUU full count
ruu_occupancy               12.1371 # avg RUU occupancy (insn's)
ruu_rate                     1.9346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2736 # avg RUU occupant latency (cycle's)
ruu_full                     0.5108 # fraction of time (cycle's) RUU was full
LSQ_count                  32053684 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0561 # avg LSQ occupancy (insn's)
lsq_rate                     1.9346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6135 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123064598 # total number of slip cycles
avg_sim_slip                10.6260 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:02:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375044 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9180775 # total simulation time in cycles
sim_IPC                      1.4510 # instructions per cycle
sim_CPI                      0.6892 # cycles per instruction
sim_exec_BW                  1.4569 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35778891 # cumulative IFQ occupancy
IFQ_fcount                  8794078 # cumulative IFQ full count
ifq_occupancy                3.8972 # avg IFQ occupancy (insn's)
ifq_rate                     1.4569 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6750 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144025965 # cumulative RUU occupancy
RUU_fcount                  8653177 # cumulative RUU full count
ruu_occupancy               15.6878 # avg RUU occupancy (insn's)
ruu_rate                     1.4569 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7683 # avg RUU occupant latency (cycle's)
ruu_full                     0.9425 # fraction of time (cycle's) RUU was full
LSQ_count                  75479055 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2214 # avg LSQ occupancy (insn's)
lsq_rate                     1.4569 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6433 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  239220546 # total number of slip cycles
avg_sim_slip                17.9583 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:02:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12555950 # total simulation time in cycles
sim_IPC                      1.7027 # instructions per cycle
sim_CPI                      0.5873 # cycles per instruction
sim_exec_BW                  1.7084 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49080422 # cumulative IFQ occupancy
IFQ_fcount                 11650933 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2881 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199457825 # cumulative RUU occupancy
RUU_fcount                 12430808 # cumulative RUU full count
ruu_occupancy               15.8855 # avg RUU occupancy (insn's)
ruu_rate                     1.7084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2987 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63945155 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0928 # avg LSQ occupancy (insn's)
lsq_rate                     1.7084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9811 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291185841 # total number of slip cycles
avg_sim_slip                13.6200 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:03:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861171 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37161453 # total simulation time in cycles
sim_IPC                      0.7497 # instructions per cycle
sim_CPI                      1.3339 # cycles per instruction
sim_exec_BW                  0.7497 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148578185 # cumulative IFQ occupancy
IFQ_fcount                 37144306 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7497 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3328 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 594315044 # cumulative RUU occupancy
RUU_fcount                 37143590 # cumulative RUU full count
ruu_occupancy               15.9928 # avg RUU occupancy (insn's)
ruu_rate                     0.7497 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.3313 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180662419 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8616 # avg LSQ occupancy (insn's)
lsq_rate                     0.7497 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4844 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  811486162 # total number of slip cycles
avg_sim_slip                29.1276 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:03:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6869828 # total simulation time in cycles
sim_IPC                      1.9079 # instructions per cycle
sim_CPI                      0.5241 # cycles per instruction
sim_exec_BW                  1.9140 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25573360 # cumulative IFQ occupancy
IFQ_fcount                  6267737 # cumulative IFQ full count
ifq_occupancy                3.7226 # avg IFQ occupancy (insn's)
ifq_rate                     1.9140 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9449 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105574139 # cumulative RUU occupancy
RUU_fcount                  5703898 # cumulative RUU full count
ruu_occupancy               15.3678 # avg RUU occupancy (insn's)
ruu_rate                     1.9140 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0291 # avg RUU occupant latency (cycle's)
ruu_full                     0.8303 # fraction of time (cycle's) RUU was full
LSQ_count                  32281354 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6990 # avg LSQ occupancy (insn's)
lsq_rate                     1.9140 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4550 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154906101 # total number of slip cycles
avg_sim_slip                11.8185 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:03:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264805 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6434851 # total simulation time in cycles
sim_IPC                      1.7998 # instructions per cycle
sim_CPI                      0.5556 # cycles per instruction
sim_exec_BW                  1.9060 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19025743 # cumulative IFQ occupancy
IFQ_fcount                  3928276 # cumulative IFQ full count
ifq_occupancy                2.9567 # avg IFQ occupancy (insn's)
ifq_rate                     1.9060 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5512 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6105 # fraction of time (cycle's) IFQ was full
RUU_count                  78349665 # cumulative RUU occupancy
RUU_fcount                  3326325 # cumulative RUU full count
ruu_occupancy               12.1758 # avg RUU occupancy (insn's)
ruu_rate                     1.9060 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3882 # avg RUU occupant latency (cycle's)
ruu_full                     0.5169 # fraction of time (cycle's) RUU was full
LSQ_count                  32370723 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0305 # avg LSQ occupancy (insn's)
lsq_rate                     1.9060 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6393 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124787484 # total number of slip cycles
avg_sim_slip                10.7747 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:03:42 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375321 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9837780 # total simulation time in cycles
sim_IPC                      1.3541 # instructions per cycle
sim_CPI                      0.7385 # cycles per instruction
sim_exec_BW                  1.3596 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  38306949 # cumulative IFQ occupancy
IFQ_fcount                  9426093 # cumulative IFQ full count
ifq_occupancy                3.8939 # avg IFQ occupancy (insn's)
ifq_rate                     1.3596 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8640 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9582 # fraction of time (cycle's) IFQ was full
RUU_count                 154146431 # cumulative RUU occupancy
RUU_fcount                  9285124 # cumulative RUU full count
ruu_occupancy               15.6688 # avg RUU occupancy (insn's)
ruu_rate                     1.3596 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.5247 # avg RUU occupant latency (cycle's)
ruu_full                     0.9438 # fraction of time (cycle's) RUU was full
LSQ_count                  81501873 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2846 # avg LSQ occupancy (insn's)
lsq_rate                     1.3596 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0935 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  255363141 # total number of slip cycles
avg_sim_slip                19.1701 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:03:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12634560 # total simulation time in cycles
sim_IPC                      1.6921 # instructions per cycle
sim_CPI                      0.5910 # cycles per instruction
sim_exec_BW                  1.6977 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49370502 # cumulative IFQ occupancy
IFQ_fcount                 11723453 # cumulative IFQ full count
ifq_occupancy                3.9076 # avg IFQ occupancy (insn's)
ifq_rate                     1.6977 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3016 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200619860 # cumulative RUU occupancy
RUU_fcount                 12503328 # cumulative RUU full count
ruu_occupancy               15.8787 # avg RUU occupancy (insn's)
ruu_rate                     1.6977 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3529 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  64309575 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0900 # avg LSQ occupancy (insn's)
lsq_rate                     1.6977 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9981 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292712086 # total number of slip cycles
avg_sim_slip                13.6914 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:04:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861451 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38744468 # total simulation time in cycles
sim_IPC                      0.7191 # instructions per cycle
sim_CPI                      1.3907 # cycles per instruction
sim_exec_BW                  0.7191 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154880985 # cumulative IFQ occupancy
IFQ_fcount                 38720006 # cumulative IFQ full count
ifq_occupancy                3.9975 # avg IFQ occupancy (insn's)
ifq_rate                     0.7191 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5590 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 619527749 # cumulative RUU occupancy
RUU_fcount                 38719220 # cumulative RUU full count
ruu_occupancy               15.9901 # avg RUU occupancy (insn's)
ruu_rate                     0.7191 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.2360 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 188280204 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8595 # avg LSQ occupancy (insn's)
lsq_rate                     0.7191 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7577 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  844316512 # total number of slip cycles
avg_sim_slip                30.3060 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:04:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6830568 # total simulation time in cycles
sim_IPC                      1.9189 # instructions per cycle
sim_CPI                      0.5211 # cycles per instruction
sim_exec_BW                  1.9250 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25430320 # cumulative IFQ occupancy
IFQ_fcount                  6231977 # cumulative IFQ full count
ifq_occupancy                3.7230 # avg IFQ occupancy (insn's)
ifq_rate                     1.9250 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9340 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105000419 # cumulative RUU occupancy
RUU_fcount                  5668138 # cumulative RUU full count
ruu_occupancy               15.3721 # avg RUU occupancy (insn's)
ruu_rate                     1.9250 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9854 # avg RUU occupant latency (cycle's)
ruu_full                     0.8298 # fraction of time (cycle's) RUU was full
LSQ_count                  32089674 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6980 # avg LSQ occupancy (insn's)
lsq_rate                     1.9250 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4405 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154141061 # total number of slip cycles
avg_sim_slip                11.7601 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:04:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264725 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6380411 # total simulation time in cycles
sim_IPC                      1.8152 # instructions per cycle
sim_CPI                      0.5509 # cycles per instruction
sim_exec_BW                  1.9222 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18825023 # cumulative IFQ occupancy
IFQ_fcount                  3878096 # cumulative IFQ full count
ifq_occupancy                2.9504 # avg IFQ occupancy (insn's)
ifq_rate                     1.9222 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5349 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6078 # fraction of time (cycle's) IFQ was full
RUU_count                  77546293 # cumulative RUU occupancy
RUU_fcount                  3276165 # cumulative RUU full count
ruu_occupancy               12.1538 # avg RUU occupancy (insn's)
ruu_rate                     1.9222 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3227 # avg RUU occupant latency (cycle's)
ruu_full                     0.5135 # fraction of time (cycle's) RUU was full
LSQ_count                  32189539 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0451 # avg LSQ occupancy (insn's)
lsq_rate                     1.9222 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6246 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123802928 # total number of slip cycles
avg_sim_slip                10.6897 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:04:47 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375164 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9462340 # total simulation time in cycles
sim_IPC                      1.4078 # instructions per cycle
sim_CPI                      0.7103 # cycles per instruction
sim_exec_BW                  1.4135 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36862311 # cumulative IFQ occupancy
IFQ_fcount                  9064933 # cumulative IFQ full count
ifq_occupancy                3.8957 # avg IFQ occupancy (insn's)
ifq_rate                     1.4135 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7560 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 148363110 # cumulative RUU occupancy
RUU_fcount                  8924002 # cumulative RUU full count
ruu_occupancy               15.6793 # avg RUU occupancy (insn's)
ruu_rate                     1.4135 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0924 # avg RUU occupant latency (cycle's)
ruu_full                     0.9431 # fraction of time (cycle's) RUU was full
LSQ_count                  78060194 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2496 # avg LSQ occupancy (insn's)
lsq_rate                     1.4135 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8362 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  246138530 # total number of slip cycles
avg_sim_slip                18.4776 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:04:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12589640 # total simulation time in cycles
sim_IPC                      1.6982 # instructions per cycle
sim_CPI                      0.5889 # cycles per instruction
sim_exec_BW                  1.7038 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49204742 # cumulative IFQ occupancy
IFQ_fcount                 11682013 # cumulative IFQ full count
ifq_occupancy                3.9084 # avg IFQ occupancy (insn's)
ifq_rate                     1.7038 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2939 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199955840 # cumulative RUU occupancy
RUU_fcount                 12461888 # cumulative RUU full count
ruu_occupancy               15.8826 # avg RUU occupancy (insn's)
ruu_rate                     1.7038 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3219 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64101335 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0916 # avg LSQ occupancy (insn's)
lsq_rate                     1.7038 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9884 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291839946 # total number of slip cycles
avg_sim_slip                13.6506 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:05:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861291 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37839888 # total simulation time in cycles
sim_IPC                      0.7363 # instructions per cycle
sim_CPI                      1.3582 # cycles per instruction
sim_exec_BW                  0.7363 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151279385 # cumulative IFQ occupancy
IFQ_fcount                 37819606 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7363 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4297 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 605120489 # cumulative RUU occupancy
RUU_fcount                 37818860 # cumulative RUU full count
ruu_occupancy               15.9916 # avg RUU occupancy (insn's)
ruu_rate                     0.7363 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.7190 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 183927184 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8607 # avg LSQ occupancy (insn's)
lsq_rate                     0.7363 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6015 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  825556312 # total number of slip cycles
avg_sim_slip                29.6326 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:05:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6942459 # total simulation time in cycles
sim_IPC                      1.8880 # instructions per cycle
sim_CPI                      0.5297 # cycles per instruction
sim_exec_BW                  1.8940 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25837984 # cumulative IFQ occupancy
IFQ_fcount                  6333893 # cumulative IFQ full count
ifq_occupancy                3.7217 # avg IFQ occupancy (insn's)
ifq_rate                     1.8940 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9650 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106635521 # cumulative RUU occupancy
RUU_fcount                  5770054 # cumulative RUU full count
ruu_occupancy               15.3599 # avg RUU occupancy (insn's)
ruu_rate                     1.8940 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.1098 # avg RUU occupant latency (cycle's)
ruu_full                     0.8311 # fraction of time (cycle's) RUU was full
LSQ_count                  32635962 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7009 # avg LSQ occupancy (insn's)
lsq_rate                     1.8940 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4820 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156321425 # total number of slip cycles
avg_sim_slip                11.9264 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:05:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264954 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6535600 # total simulation time in cycles
sim_IPC                      1.7721 # instructions per cycle
sim_CPI                      0.5643 # cycles per instruction
sim_exec_BW                  1.8766 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19397213 # cumulative IFQ occupancy
IFQ_fcount                  4021144 # cumulative IFQ full count
ifq_occupancy                2.9679 # avg IFQ occupancy (insn's)
ifq_rate                     1.8766 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5815 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6153 # fraction of time (cycle's) IFQ was full
RUU_count                  79836389 # cumulative RUU occupancy
RUU_fcount                  3419160 # cumulative RUU full count
ruu_occupancy               12.2156 # avg RUU occupancy (insn's)
ruu_rate                     1.8766 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5093 # avg RUU occupant latency (cycle's)
ruu_full                     0.5232 # fraction of time (cycle's) RUU was full
LSQ_count                  32706132 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0043 # avg LSQ occupancy (insn's)
lsq_rate                     1.8766 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6666 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126609630 # total number of slip cycles
avg_sim_slip                10.9320 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917910 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:05:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375617 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10532381 # total simulation time in cycles
sim_IPC                      1.2648 # instructions per cycle
sim_CPI                      0.7907 # cycles per instruction
sim_exec_BW                  1.2700 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40979678 # cumulative IFQ occupancy
IFQ_fcount                 10094275 # cumulative IFQ full count
ifq_occupancy                3.8908 # avg IFQ occupancy (insn's)
ifq_rate                     1.2700 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0638 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 164845908 # cumulative RUU occupancy
RUU_fcount                  9953233 # cumulative RUU full count
ruu_occupancy               15.6513 # avg RUU occupancy (insn's)
ruu_rate                     1.2700 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.3244 # avg RUU occupant latency (cycle's)
ruu_full                     0.9450 # fraction of time (cycle's) RUU was full
LSQ_count                  87869122 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3428 # avg LSQ occupancy (insn's)
lsq_rate                     1.2700 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5694 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  272429127 # total number of slip cycles
avg_sim_slip                20.4512 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:06:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12717662 # total simulation time in cycles
sim_IPC                      1.6811 # instructions per cycle
sim_CPI                      0.5949 # cycles per instruction
sim_exec_BW                  1.6866 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49677158 # cumulative IFQ occupancy
IFQ_fcount                 11800117 # cumulative IFQ full count
ifq_occupancy                3.9062 # avg IFQ occupancy (insn's)
ifq_rate                     1.6866 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3159 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201848297 # cumulative RUU occupancy
RUU_fcount                 12579992 # cumulative RUU full count
ruu_occupancy               15.8715 # avg RUU occupancy (insn's)
ruu_rate                     1.6866 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4101 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64694819 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0870 # avg LSQ occupancy (insn's)
lsq_rate                     1.6866 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0161 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294325545 # total number of slip cycles
avg_sim_slip                13.7669 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:06:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861747 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40417941 # total simulation time in cycles
sim_IPC                      0.6893 # instructions per cycle
sim_CPI                      1.4508 # cycles per instruction
sim_exec_BW                  0.6893 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 161543945 # cumulative IFQ occupancy
IFQ_fcount                 40385746 # cumulative IFQ full count
ifq_occupancy                3.9968 # avg IFQ occupancy (insn's)
ifq_rate                     0.6893 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7981 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 646181180 # cumulative RUU occupancy
RUU_fcount                 40384886 # cumulative RUU full count
ruu_occupancy               15.9875 # avg RUU occupancy (insn's)
ruu_rate                     0.6893 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.1924 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 196333291 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8576 # avg LSQ occupancy (insn's)
lsq_rate                     0.6893 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.0467 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  879022882 # total number of slip cycles
avg_sim_slip                31.5518 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:06:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6801123 # total simulation time in cycles
sim_IPC                      1.9272 # instructions per cycle
sim_CPI                      0.5189 # cycles per instruction
sim_exec_BW                  1.9334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25323040 # cumulative IFQ occupancy
IFQ_fcount                  6205157 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104570129 # cumulative RUU occupancy
RUU_fcount                  5641318 # cumulative RUU full count
ruu_occupancy               15.3754 # avg RUU occupancy (insn's)
ruu_rate                     1.9334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9527 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31945914 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6972 # avg LSQ occupancy (insn's)
lsq_rate                     1.9334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4295 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153567281 # total number of slip cycles
avg_sim_slip                11.7163 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:06:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264665 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6339581 # total simulation time in cycles
sim_IPC                      1.8269 # instructions per cycle
sim_CPI                      0.5474 # cycles per instruction
sim_exec_BW                  1.9346 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18674483 # cumulative IFQ occupancy
IFQ_fcount                  3840461 # cumulative IFQ full count
ifq_occupancy                2.9457 # avg IFQ occupancy (insn's)
ifq_rate                     1.9346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5226 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6058 # fraction of time (cycle's) IFQ was full
RUU_count                  76943818 # cumulative RUU occupancy
RUU_fcount                  3238545 # cumulative RUU full count
ruu_occupancy               12.1371 # avg RUU occupancy (insn's)
ruu_rate                     1.9346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2736 # avg RUU occupant latency (cycle's)
ruu_full                     0.5108 # fraction of time (cycle's) RUU was full
LSQ_count                  32053684 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0561 # avg LSQ occupancy (insn's)
lsq_rate                     1.9346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6135 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123064598 # total number of slip cycles
avg_sim_slip                10.6260 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:06:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375044 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9180775 # total simulation time in cycles
sim_IPC                      1.4510 # instructions per cycle
sim_CPI                      0.6892 # cycles per instruction
sim_exec_BW                  1.4569 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35778891 # cumulative IFQ occupancy
IFQ_fcount                  8794078 # cumulative IFQ full count
ifq_occupancy                3.8972 # avg IFQ occupancy (insn's)
ifq_rate                     1.4569 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6750 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144025965 # cumulative RUU occupancy
RUU_fcount                  8653177 # cumulative RUU full count
ruu_occupancy               15.6878 # avg RUU occupancy (insn's)
ruu_rate                     1.4569 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7683 # avg RUU occupant latency (cycle's)
ruu_full                     0.9425 # fraction of time (cycle's) RUU was full
LSQ_count                  75479055 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2214 # avg LSQ occupancy (insn's)
lsq_rate                     1.4569 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6433 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  239220546 # total number of slip cycles
avg_sim_slip                17.9583 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:07:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12555950 # total simulation time in cycles
sim_IPC                      1.7027 # instructions per cycle
sim_CPI                      0.5873 # cycles per instruction
sim_exec_BW                  1.7084 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49080422 # cumulative IFQ occupancy
IFQ_fcount                 11650933 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2881 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199457825 # cumulative RUU occupancy
RUU_fcount                 12430808 # cumulative RUU full count
ruu_occupancy               15.8855 # avg RUU occupancy (insn's)
ruu_rate                     1.7084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2987 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63945155 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0928 # avg LSQ occupancy (insn's)
lsq_rate                     1.7084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9811 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291185841 # total number of slip cycles
avg_sim_slip                13.6200 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 5 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:07:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 5 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861171 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37161453 # total simulation time in cycles
sim_IPC                      0.7497 # instructions per cycle
sim_CPI                      1.3339 # cycles per instruction
sim_exec_BW                  0.7497 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148578185 # cumulative IFQ occupancy
IFQ_fcount                 37144306 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7497 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3328 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 594315044 # cumulative RUU occupancy
RUU_fcount                 37143590 # cumulative RUU full count
ruu_occupancy               15.9928 # avg RUU occupancy (insn's)
ruu_rate                     0.7497 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.3313 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180662419 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8616 # avg LSQ occupancy (insn's)
lsq_rate                     0.7497 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4844 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  811486162 # total number of slip cycles
avg_sim_slip                29.1276 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:07:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6856087 # total simulation time in cycles
sim_IPC                      1.9117 # instructions per cycle
sim_CPI                      0.5231 # cycles per instruction
sim_exec_BW                  1.9179 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25523296 # cumulative IFQ occupancy
IFQ_fcount                  6255221 # cumulative IFQ full count
ifq_occupancy                3.7227 # avg IFQ occupancy (insn's)
ifq_rate                     1.9179 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9411 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105373337 # cumulative RUU occupancy
RUU_fcount                  5691382 # cumulative RUU full count
ruu_occupancy               15.3693 # avg RUU occupancy (insn's)
ruu_rate                     1.9179 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0138 # avg RUU occupant latency (cycle's)
ruu_full                     0.8301 # fraction of time (cycle's) RUU was full
LSQ_count                  32214266 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6986 # avg LSQ occupancy (insn's)
lsq_rate                     1.9179 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4499 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154638337 # total number of slip cycles
avg_sim_slip                11.7980 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:07:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264777 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6415797 # total simulation time in cycles
sim_IPC                      1.8052 # instructions per cycle
sim_CPI                      0.5540 # cycles per instruction
sim_exec_BW                  1.9117 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18955491 # cumulative IFQ occupancy
IFQ_fcount                  3910713 # cumulative IFQ full count
ifq_occupancy                2.9545 # avg IFQ occupancy (insn's)
ifq_rate                     1.9117 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5455 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6095 # fraction of time (cycle's) IFQ was full
RUU_count                  78068440 # cumulative RUU occupancy
RUU_fcount                  3308769 # cumulative RUU full count
ruu_occupancy               12.1682 # avg RUU occupancy (insn's)
ruu_rate                     1.9117 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3653 # avg RUU occupant latency (cycle's)
ruu_full                     0.5157 # fraction of time (cycle's) RUU was full
LSQ_count                  32307282 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0356 # avg LSQ occupancy (insn's)
lsq_rate                     1.9117 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6342 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124442818 # total number of slip cycles
avg_sim_slip                10.7450 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:08:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375268 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9706372 # total simulation time in cycles
sim_IPC                      1.3724 # instructions per cycle
sim_CPI                      0.7287 # cycles per instruction
sim_exec_BW                  1.3780 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37801309 # cumulative IFQ occupancy
IFQ_fcount                  9299683 # cumulative IFQ full count
ifq_occupancy                3.8945 # avg IFQ occupancy (insn's)
ifq_rate                     1.3780 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8262 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9581 # fraction of time (cycle's) IFQ was full
RUU_count                 152122238 # cumulative RUU occupancy
RUU_fcount                  9158726 # cumulative RUU full count
ruu_occupancy               15.6724 # avg RUU occupancy (insn's)
ruu_rate                     1.3780 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.3734 # avg RUU occupant latency (cycle's)
ruu_full                     0.9436 # fraction of time (cycle's) RUU was full
LSQ_count                  80297280 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2726 # avg LSQ occupancy (insn's)
lsq_rate                     1.3780 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0034 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  252134484 # total number of slip cycles
avg_sim_slip                18.9277 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:08:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12618838 # total simulation time in cycles
sim_IPC                      1.6942 # instructions per cycle
sim_CPI                      0.5902 # cycles per instruction
sim_exec_BW                  1.6998 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49312486 # cumulative IFQ occupancy
IFQ_fcount                 11708949 # cumulative IFQ full count
ifq_occupancy                3.9078 # avg IFQ occupancy (insn's)
ifq_rate                     1.6998 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2989 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200387453 # cumulative RUU occupancy
RUU_fcount                 12488824 # cumulative RUU full count
ruu_occupancy               15.8800 # avg RUU occupancy (insn's)
ruu_rate                     1.6998 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3420 # avg RUU occupant latency (cycle's)
ruu_full                     0.9897 # fraction of time (cycle's) RUU was full
LSQ_count                  64236691 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0905 # avg LSQ occupancy (insn's)
lsq_rate                     1.6998 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9947 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292406837 # total number of slip cycles
avg_sim_slip                13.6771 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:08:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861395 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38427865 # total simulation time in cycles
sim_IPC                      0.7250 # instructions per cycle
sim_CPI                      1.3793 # cycles per instruction
sim_exec_BW                  0.7250 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 153620425 # cumulative IFQ occupancy
IFQ_fcount                 38404866 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7250 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5137 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 614485208 # cumulative RUU occupancy
RUU_fcount                 38404094 # cumulative RUU full count
ruu_occupancy               15.9906 # avg RUU occupancy (insn's)
ruu_rate                     0.7250 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.0551 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 186756647 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8599 # avg LSQ occupancy (insn's)
lsq_rate                     0.7250 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7031 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  837750442 # total number of slip cycles
avg_sim_slip                30.0703 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:08:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6824679 # total simulation time in cycles
sim_IPC                      1.9205 # instructions per cycle
sim_CPI                      0.5207 # cycles per instruction
sim_exec_BW                  1.9267 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25408864 # cumulative IFQ occupancy
IFQ_fcount                  6226613 # cumulative IFQ full count
ifq_occupancy                3.7231 # avg IFQ occupancy (insn's)
ifq_rate                     1.9267 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9324 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104914361 # cumulative RUU occupancy
RUU_fcount                  5662774 # cumulative RUU full count
ruu_occupancy               15.3728 # avg RUU occupancy (insn's)
ruu_rate                     1.9267 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9789 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32060922 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6978 # avg LSQ occupancy (insn's)
lsq_rate                     1.9267 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4383 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154026305 # total number of slip cycles
avg_sim_slip                11.7513 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:08:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264713 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6372245 # total simulation time in cycles
sim_IPC                      1.8175 # instructions per cycle
sim_CPI                      0.5502 # cycles per instruction
sim_exec_BW                  1.9247 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18794915 # cumulative IFQ occupancy
IFQ_fcount                  3870569 # cumulative IFQ full count
ifq_occupancy                2.9495 # avg IFQ occupancy (insn's)
ifq_rate                     1.9247 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5324 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6074 # fraction of time (cycle's) IFQ was full
RUU_count                  77425798 # cumulative RUU occupancy
RUU_fcount                  3268641 # cumulative RUU full count
ruu_occupancy               12.1505 # avg RUU occupancy (insn's)
ruu_rate                     1.9247 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3129 # avg RUU occupant latency (cycle's)
ruu_full                     0.5129 # fraction of time (cycle's) RUU was full
LSQ_count                  32162368 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0473 # avg LSQ occupancy (insn's)
lsq_rate                     1.9247 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6223 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123655262 # total number of slip cycles
avg_sim_slip                10.6770 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:09:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375140 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9406027 # total simulation time in cycles
sim_IPC                      1.4162 # instructions per cycle
sim_CPI                      0.7061 # cycles per instruction
sim_exec_BW                  1.4220 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36645627 # cumulative IFQ occupancy
IFQ_fcount                  9010762 # cumulative IFQ full count
ifq_occupancy                3.8960 # avg IFQ occupancy (insn's)
ifq_rate                     1.4220 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7398 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 147495681 # cumulative RUU occupancy
RUU_fcount                  8869837 # cumulative RUU full count
ruu_occupancy               15.6810 # avg RUU occupancy (insn's)
ruu_rate                     1.4220 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0276 # avg RUU occupant latency (cycle's)
ruu_full                     0.9430 # fraction of time (cycle's) RUU was full
LSQ_count                  77543967 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2441 # avg LSQ occupancy (insn's)
lsq_rate                     1.4220 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7976 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  244754934 # total number of slip cycles
avg_sim_slip                18.3737 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:09:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12582902 # total simulation time in cycles
sim_IPC                      1.6991 # instructions per cycle
sim_CPI                      0.5886 # cycles per instruction
sim_exec_BW                  1.7047 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49179878 # cumulative IFQ occupancy
IFQ_fcount                 11675797 # cumulative IFQ full count
ifq_occupancy                3.9085 # avg IFQ occupancy (insn's)
ifq_rate                     1.7047 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2928 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199856237 # cumulative RUU occupancy
RUU_fcount                 12455672 # cumulative RUU full count
ruu_occupancy               15.8832 # avg RUU occupancy (insn's)
ruu_rate                     1.7047 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3173 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64070099 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0918 # avg LSQ occupancy (insn's)
lsq_rate                     1.7047 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9869 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291709125 # total number of slip cycles
avg_sim_slip                13.6445 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:09:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861267 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37704201 # total simulation time in cycles
sim_IPC                      0.7389 # instructions per cycle
sim_CPI                      1.3534 # cycles per instruction
sim_exec_BW                  0.7389 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150739145 # cumulative IFQ occupancy
IFQ_fcount                 37684546 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7389 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4103 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 602959400 # cumulative RUU occupancy
RUU_fcount                 37683806 # cumulative RUU full count
ruu_occupancy               15.9918 # avg RUU occupancy (insn's)
ruu_rate                     0.7389 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6415 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 183274231 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8608 # avg LSQ occupancy (insn's)
lsq_rate                     0.7389 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5781 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  822742282 # total number of slip cycles
avg_sim_slip                29.5316 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:09:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6942459 # total simulation time in cycles
sim_IPC                      1.8880 # instructions per cycle
sim_CPI                      0.5297 # cycles per instruction
sim_exec_BW                  1.8940 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25837984 # cumulative IFQ occupancy
IFQ_fcount                  6333893 # cumulative IFQ full count
ifq_occupancy                3.7217 # avg IFQ occupancy (insn's)
ifq_rate                     1.8940 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9650 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106635521 # cumulative RUU occupancy
RUU_fcount                  5770054 # cumulative RUU full count
ruu_occupancy               15.3599 # avg RUU occupancy (insn's)
ruu_rate                     1.8940 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.1098 # avg RUU occupant latency (cycle's)
ruu_full                     0.8311 # fraction of time (cycle's) RUU was full
LSQ_count                  32635962 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7009 # avg LSQ occupancy (insn's)
lsq_rate                     1.8940 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4820 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156321425 # total number of slip cycles
avg_sim_slip                11.9264 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:10:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264954 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6535600 # total simulation time in cycles
sim_IPC                      1.7721 # instructions per cycle
sim_CPI                      0.5643 # cycles per instruction
sim_exec_BW                  1.8766 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19397213 # cumulative IFQ occupancy
IFQ_fcount                  4021144 # cumulative IFQ full count
ifq_occupancy                2.9679 # avg IFQ occupancy (insn's)
ifq_rate                     1.8766 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5815 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6153 # fraction of time (cycle's) IFQ was full
RUU_count                  79836389 # cumulative RUU occupancy
RUU_fcount                  3419160 # cumulative RUU full count
ruu_occupancy               12.2156 # avg RUU occupancy (insn's)
ruu_rate                     1.8766 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5093 # avg RUU occupant latency (cycle's)
ruu_full                     0.5232 # fraction of time (cycle's) RUU was full
LSQ_count                  32706132 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0043 # avg LSQ occupancy (insn's)
lsq_rate                     1.8766 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6666 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126609630 # total number of slip cycles
avg_sim_slip                10.9320 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917910 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:10:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13375617 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10532381 # total simulation time in cycles
sim_IPC                      1.2648 # instructions per cycle
sim_CPI                      0.7907 # cycles per instruction
sim_exec_BW                  1.2700 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40979678 # cumulative IFQ occupancy
IFQ_fcount                 10094275 # cumulative IFQ full count
ifq_occupancy                3.8908 # avg IFQ occupancy (insn's)
ifq_rate                     1.2700 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0638 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 164845908 # cumulative RUU occupancy
RUU_fcount                  9953233 # cumulative RUU full count
ruu_occupancy               15.6513 # avg RUU occupancy (insn's)
ruu_rate                     1.2700 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.3244 # avg RUU occupant latency (cycle's)
ruu_full                     0.9450 # fraction of time (cycle's) RUU was full
LSQ_count                  87869122 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3428 # avg LSQ occupancy (insn's)
lsq_rate                     1.2700 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5694 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  272429127 # total number of slip cycles
avg_sim_slip                20.4512 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:10:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12717662 # total simulation time in cycles
sim_IPC                      1.6811 # instructions per cycle
sim_CPI                      0.5949 # cycles per instruction
sim_exec_BW                  1.6866 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49677158 # cumulative IFQ occupancy
IFQ_fcount                 11800117 # cumulative IFQ full count
ifq_occupancy                3.9062 # avg IFQ occupancy (insn's)
ifq_rate                     1.6866 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3159 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201848297 # cumulative RUU occupancy
RUU_fcount                 12579992 # cumulative RUU full count
ruu_occupancy               15.8715 # avg RUU occupancy (insn's)
ruu_rate                     1.6866 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4101 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64694819 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0870 # avg LSQ occupancy (insn's)
lsq_rate                     1.6866 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0161 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294325545 # total number of slip cycles
avg_sim_slip                13.7669 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:10:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27861747 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40417941 # total simulation time in cycles
sim_IPC                      0.6893 # instructions per cycle
sim_CPI                      1.4508 # cycles per instruction
sim_exec_BW                  0.6893 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 161543945 # cumulative IFQ occupancy
IFQ_fcount                 40385746 # cumulative IFQ full count
ifq_occupancy                3.9968 # avg IFQ occupancy (insn's)
ifq_rate                     0.6893 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7981 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 646181180 # cumulative RUU occupancy
RUU_fcount                 40384886 # cumulative RUU full count
ruu_occupancy               15.9875 # avg RUU occupancy (insn's)
ruu_rate                     0.6893 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.1924 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 196333291 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8576 # avg LSQ occupancy (insn's)
lsq_rate                     0.6893 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.0467 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  879022882 # total number of slip cycles
avg_sim_slip                31.5518 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:11:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6801123 # total simulation time in cycles
sim_IPC                      1.9272 # instructions per cycle
sim_CPI                      0.5189 # cycles per instruction
sim_exec_BW                  1.9334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25323040 # cumulative IFQ occupancy
IFQ_fcount                  6205157 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104570129 # cumulative RUU occupancy
RUU_fcount                  5641318 # cumulative RUU full count
ruu_occupancy               15.3754 # avg RUU occupancy (insn's)
ruu_rate                     1.9334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9527 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31945914 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6972 # avg LSQ occupancy (insn's)
lsq_rate                     1.9334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4295 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153567281 # total number of slip cycles
avg_sim_slip                11.7163 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:11:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264665 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6339581 # total simulation time in cycles
sim_IPC                      1.8269 # instructions per cycle
sim_CPI                      0.5474 # cycles per instruction
sim_exec_BW                  1.9346 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18674483 # cumulative IFQ occupancy
IFQ_fcount                  3840461 # cumulative IFQ full count
ifq_occupancy                2.9457 # avg IFQ occupancy (insn's)
ifq_rate                     1.9346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5226 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6058 # fraction of time (cycle's) IFQ was full
RUU_count                  76943818 # cumulative RUU occupancy
RUU_fcount                  3238545 # cumulative RUU full count
ruu_occupancy               12.1371 # avg RUU occupancy (insn's)
ruu_rate                     1.9346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2736 # avg RUU occupant latency (cycle's)
ruu_full                     0.5108 # fraction of time (cycle's) RUU was full
LSQ_count                  32053684 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0561 # avg LSQ occupancy (insn's)
lsq_rate                     1.9346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6135 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123064598 # total number of slip cycles
avg_sim_slip                10.6260 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:11:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375044 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9180775 # total simulation time in cycles
sim_IPC                      1.4510 # instructions per cycle
sim_CPI                      0.6892 # cycles per instruction
sim_exec_BW                  1.4569 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35778891 # cumulative IFQ occupancy
IFQ_fcount                  8794078 # cumulative IFQ full count
ifq_occupancy                3.8972 # avg IFQ occupancy (insn's)
ifq_rate                     1.4569 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6750 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144025965 # cumulative RUU occupancy
RUU_fcount                  8653177 # cumulative RUU full count
ruu_occupancy               15.6878 # avg RUU occupancy (insn's)
ruu_rate                     1.4569 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7683 # avg RUU occupant latency (cycle's)
ruu_full                     0.9425 # fraction of time (cycle's) RUU was full
LSQ_count                  75479055 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2214 # avg LSQ occupancy (insn's)
lsq_rate                     1.4569 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6433 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  239220546 # total number of slip cycles
avg_sim_slip                17.9583 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:11:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12555950 # total simulation time in cycles
sim_IPC                      1.7027 # instructions per cycle
sim_CPI                      0.5873 # cycles per instruction
sim_exec_BW                  1.7084 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49080422 # cumulative IFQ occupancy
IFQ_fcount                 11650933 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2881 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199457825 # cumulative RUU occupancy
RUU_fcount                 12430808 # cumulative RUU full count
ruu_occupancy               15.8855 # avg RUU occupancy (insn's)
ruu_rate                     1.7084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2987 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63945155 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0928 # avg LSQ occupancy (insn's)
lsq_rate                     1.7084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9811 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291185841 # total number of slip cycles
avg_sim_slip                13.6200 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 4 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:11:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 4 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861171 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37161453 # total simulation time in cycles
sim_IPC                      0.7497 # instructions per cycle
sim_CPI                      1.3339 # cycles per instruction
sim_exec_BW                  0.7497 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148578185 # cumulative IFQ occupancy
IFQ_fcount                 37144306 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7497 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3328 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 594315044 # cumulative RUU occupancy
RUU_fcount                 37143590 # cumulative RUU full count
ruu_occupancy               15.9928 # avg RUU occupancy (insn's)
ruu_rate                     0.7497 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.3313 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180662419 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8616 # avg LSQ occupancy (insn's)
lsq_rate                     0.7497 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4844 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  811486162 # total number of slip cycles
avg_sim_slip                29.1276 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:12:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6818790 # total simulation time in cycles
sim_IPC                      1.9222 # instructions per cycle
sim_CPI                      0.5202 # cycles per instruction
sim_exec_BW                  1.9283 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25387408 # cumulative IFQ occupancy
IFQ_fcount                  6221249 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9283 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9307 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104828303 # cumulative RUU occupancy
RUU_fcount                  5657410 # cumulative RUU full count
ruu_occupancy               15.3734 # avg RUU occupancy (insn's)
ruu_rate                     1.9283 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9723 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32032170 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6976 # avg LSQ occupancy (insn's)
lsq_rate                     1.9283 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4361 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153911549 # total number of slip cycles
avg_sim_slip                11.7426 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:12:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264701 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6364079 # total simulation time in cycles
sim_IPC                      1.8198 # instructions per cycle
sim_CPI                      0.5495 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18764807 # cumulative IFQ occupancy
IFQ_fcount                  3863042 # cumulative IFQ full count
ifq_occupancy                2.9486 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5300 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6070 # fraction of time (cycle's) IFQ was full
RUU_count                  77305303 # cumulative RUU occupancy
RUU_fcount                  3261117 # cumulative RUU full count
ruu_occupancy               12.1471 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3031 # avg RUU occupant latency (cycle's)
ruu_full                     0.5124 # fraction of time (cycle's) RUU was full
LSQ_count                  32135197 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0495 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6201 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123507596 # total number of slip cycles
avg_sim_slip                10.6642 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:12:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375116 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9349714 # total simulation time in cycles
sim_IPC                      1.4247 # instructions per cycle
sim_CPI                      0.7019 # cycles per instruction
sim_exec_BW                  1.4305 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36428943 # cumulative IFQ occupancy
IFQ_fcount                  8956591 # cumulative IFQ full count
ifq_occupancy                3.8963 # avg IFQ occupancy (insn's)
ifq_rate                     1.4305 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7236 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 146628252 # cumulative RUU occupancy
RUU_fcount                  8815672 # cumulative RUU full count
ruu_occupancy               15.6826 # avg RUU occupancy (insn's)
ruu_rate                     1.4305 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.9628 # avg RUU occupant latency (cycle's)
ruu_full                     0.9429 # fraction of time (cycle's) RUU was full
LSQ_count                  77027739 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2385 # avg LSQ occupancy (insn's)
lsq_rate                     1.4305 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7590 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  243371337 # total number of slip cycles
avg_sim_slip                18.2699 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:12:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12576164 # total simulation time in cycles
sim_IPC                      1.7000 # instructions per cycle
sim_CPI                      0.5882 # cycles per instruction
sim_exec_BW                  1.7056 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49155014 # cumulative IFQ occupancy
IFQ_fcount                 11669581 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7056 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2916 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199756634 # cumulative RUU occupancy
RUU_fcount                 12449456 # cumulative RUU full count
ruu_occupancy               15.8837 # avg RUU occupancy (insn's)
ruu_rate                     1.7056 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3126 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64038863 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0921 # avg LSQ occupancy (insn's)
lsq_rate                     1.7056 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9855 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291578304 # total number of slip cycles
avg_sim_slip                13.6384 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:12:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861243 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37568514 # total simulation time in cycles
sim_IPC                      0.7416 # instructions per cycle
sim_CPI                      1.3485 # cycles per instruction
sim_exec_BW                  0.7416 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150198905 # cumulative IFQ occupancy
IFQ_fcount                 37549486 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7416 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3910 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 600798311 # cumulative RUU occupancy
RUU_fcount                 37548752 # cumulative RUU full count
ruu_occupancy               15.9921 # avg RUU occupancy (insn's)
ruu_rate                     0.7416 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5639 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182621278 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8610 # avg LSQ occupancy (insn's)
lsq_rate                     0.7416 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5547 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  819928252 # total number of slip cycles
avg_sim_slip                29.4306 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:13:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6795234 # total simulation time in cycles
sim_IPC                      1.9289 # instructions per cycle
sim_CPI                      0.5184 # cycles per instruction
sim_exec_BW                  1.9350 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25301584 # cumulative IFQ occupancy
IFQ_fcount                  6199793 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9350 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9242 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104484071 # cumulative RUU occupancy
RUU_fcount                  5635954 # cumulative RUU full count
ruu_occupancy               15.3761 # avg RUU occupancy (insn's)
ruu_rate                     1.9350 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9462 # avg RUU occupant latency (cycle's)
ruu_full                     0.8294 # fraction of time (cycle's) RUU was full
LSQ_count                  31917162 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6970 # avg LSQ occupancy (insn's)
lsq_rate                     1.9350 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4273 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153452525 # total number of slip cycles
avg_sim_slip                11.7076 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:13:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264653 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6331415 # total simulation time in cycles
sim_IPC                      1.8292 # instructions per cycle
sim_CPI                      0.5467 # cycles per instruction
sim_exec_BW                  1.9371 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18644375 # cumulative IFQ occupancy
IFQ_fcount                  3832934 # cumulative IFQ full count
ifq_occupancy                2.9447 # avg IFQ occupancy (insn's)
ifq_rate                     1.9371 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5202 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6054 # fraction of time (cycle's) IFQ was full
RUU_count                  76823323 # cumulative RUU occupancy
RUU_fcount                  3231021 # cumulative RUU full count
ruu_occupancy               12.1337 # avg RUU occupancy (insn's)
ruu_rate                     1.9371 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2638 # avg RUU occupant latency (cycle's)
ruu_full                     0.5103 # fraction of time (cycle's) RUU was full
LSQ_count                  32026513 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0583 # avg LSQ occupancy (insn's)
lsq_rate                     1.9371 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6113 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122916932 # total number of slip cycles
avg_sim_slip                10.6132 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:13:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375020 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9124462 # total simulation time in cycles
sim_IPC                      1.4599 # instructions per cycle
sim_CPI                      0.6850 # cycles per instruction
sim_exec_BW                  1.4658 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35562210 # cumulative IFQ occupancy
IFQ_fcount                  8739908 # cumulative IFQ full count
ifq_occupancy                3.8975 # avg IFQ occupancy (insn's)
ifq_rate                     1.4658 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6589 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 143158534 # cumulative RUU occupancy
RUU_fcount                  8599012 # cumulative RUU full count
ruu_occupancy               15.6895 # avg RUU occupancy (insn's)
ruu_rate                     1.4658 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7034 # avg RUU occupant latency (cycle's)
ruu_full                     0.9424 # fraction of time (cycle's) RUU was full
LSQ_count                  74962824 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2156 # avg LSQ occupancy (insn's)
lsq_rate                     1.4658 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6047 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  237836944 # total number of slip cycles
avg_sim_slip                17.8544 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:13:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12549212 # total simulation time in cycles
sim_IPC                      1.7036 # instructions per cycle
sim_CPI                      0.5870 # cycles per instruction
sim_exec_BW                  1.7093 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49055558 # cumulative IFQ occupancy
IFQ_fcount                 11644717 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7093 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2870 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199358222 # cumulative RUU occupancy
RUU_fcount                 12424592 # cumulative RUU full count
ruu_occupancy               15.8861 # avg RUU occupancy (insn's)
ruu_rate                     1.7093 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2940 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63913919 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7093 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9797 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291055020 # total number of slip cycles
avg_sim_slip                13.6139 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:13:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861147 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37025766 # total simulation time in cycles
sim_IPC                      0.7524 # instructions per cycle
sim_CPI                      1.3290 # cycles per instruction
sim_exec_BW                  0.7525 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148037945 # cumulative IFQ occupancy
IFQ_fcount                 37009246 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3134 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 592153955 # cumulative RUU occupancy
RUU_fcount                 37008536 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2538 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180009466 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8617 # avg LSQ occupancy (insn's)
lsq_rate                     0.7525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4609 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  808672132 # total number of slip cycles
avg_sim_slip                29.0266 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:14:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6895347 # total simulation time in cycles
sim_IPC                      1.9009 # instructions per cycle
sim_CPI                      0.5261 # cycles per instruction
sim_exec_BW                  1.9069 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25666336 # cumulative IFQ occupancy
IFQ_fcount                  6290981 # cumulative IFQ full count
ifq_occupancy                3.7223 # avg IFQ occupancy (insn's)
ifq_rate                     1.9069 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9520 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105947057 # cumulative RUU occupancy
RUU_fcount                  5727142 # cumulative RUU full count
ruu_occupancy               15.3650 # avg RUU occupancy (insn's)
ruu_rate                     1.9069 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0574 # avg RUU occupant latency (cycle's)
ruu_full                     0.8306 # fraction of time (cycle's) RUU was full
LSQ_count                  32405946 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6997 # avg LSQ occupancy (insn's)
lsq_rate                     1.9069 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4645 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  155403377 # total number of slip cycles
avg_sim_slip                11.8564 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:14:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264857 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6470243 # total simulation time in cycles
sim_IPC                      1.7900 # instructions per cycle
sim_CPI                      0.5587 # cycles per instruction
sim_exec_BW                  1.8956 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19156233 # cumulative IFQ occupancy
IFQ_fcount                  3960899 # cumulative IFQ full count
ifq_occupancy                2.9607 # avg IFQ occupancy (insn's)
ifq_rate                     1.8956 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5619 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6122 # fraction of time (cycle's) IFQ was full
RUU_count                  78871973 # cumulative RUU occupancy
RUU_fcount                  3358939 # cumulative RUU full count
ruu_occupancy               12.1900 # avg RUU occupancy (insn's)
ruu_rate                     1.8956 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4307 # avg RUU occupant latency (cycle's)
ruu_full                     0.5191 # fraction of time (cycle's) RUU was full
LSQ_count                  32488527 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0212 # avg LSQ occupancy (insn's)
lsq_rate                     1.8956 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6489 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125427609 # total number of slip cycles
avg_sim_slip                10.8300 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:14:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375425 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10081829 # total simulation time in cycles
sim_IPC                      1.3213 # instructions per cycle
sim_CPI                      0.7568 # cycles per instruction
sim_exec_BW                  1.3267 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  39246017 # cumulative IFQ occupancy
IFQ_fcount                  9660860 # cumulative IFQ full count
ifq_occupancy                3.8927 # avg IFQ occupancy (insn's)
ifq_rate                     1.3267 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.9342 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9582 # fraction of time (cycle's) IFQ was full
RUU_count                 157905706 # cumulative RUU occupancy
RUU_fcount                  9519865 # cumulative RUU full count
ruu_occupancy               15.6624 # avg RUU occupancy (insn's)
ruu_rate                     1.3267 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.8057 # avg RUU occupant latency (cycle's)
ruu_full                     0.9443 # fraction of time (cycle's) RUU was full
LSQ_count                  83739007 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3059 # avg LSQ occupancy (insn's)
lsq_rate                     1.3267 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2607 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  261359290 # total number of slip cycles
avg_sim_slip                19.6202 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:14:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12663758 # total simulation time in cycles
sim_IPC                      1.6882 # instructions per cycle
sim_CPI                      0.5923 # cycles per instruction
sim_exec_BW                  1.6938 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49478246 # cumulative IFQ occupancy
IFQ_fcount                 11750389 # cumulative IFQ full count
ifq_occupancy                3.9071 # avg IFQ occupancy (insn's)
ifq_rate                     1.6938 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3067 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201051473 # cumulative RUU occupancy
RUU_fcount                 12530264 # cumulative RUU full count
ruu_occupancy               15.8761 # avg RUU occupancy (insn's)
ruu_rate                     1.6938 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3730 # avg RUU occupant latency (cycle's)
ruu_full                     0.9895 # fraction of time (cycle's) RUU was full
LSQ_count                  64444931 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0889 # avg LSQ occupancy (insn's)
lsq_rate                     1.6938 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0044 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293278977 # total number of slip cycles
avg_sim_slip                13.7179 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:14:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861555 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  39332445 # total simulation time in cycles
sim_IPC                      0.7083 # instructions per cycle
sim_CPI                      1.4118 # cycles per instruction
sim_exec_BW                  0.7084 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 157222025 # cumulative IFQ occupancy
IFQ_fcount                 39305266 # cumulative IFQ full count
ifq_occupancy                3.9973 # avg IFQ occupancy (insn's)
ifq_rate                     0.7084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.6430 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 628892468 # cumulative RUU occupancy
RUU_fcount                 39304454 # cumulative RUU full count
ruu_occupancy               15.9892 # avg RUU occupancy (insn's)
ruu_rate                     0.7084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.5721 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 191109667 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8588 # avg LSQ occupancy (insn's)
lsq_rate                     0.7084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8593 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  856510642 # total number of slip cycles
avg_sim_slip                30.7437 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:15:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6777567 # total simulation time in cycles
sim_IPC                      1.9339 # instructions per cycle
sim_CPI                      0.5171 # cycles per instruction
sim_exec_BW                  1.9401 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25237216 # cumulative IFQ occupancy
IFQ_fcount                  6183701 # cumulative IFQ full count
ifq_occupancy                3.7236 # avg IFQ occupancy (insn's)
ifq_rate                     1.9401 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9193 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104225897 # cumulative RUU occupancy
RUU_fcount                  5619862 # cumulative RUU full count
ruu_occupancy               15.3781 # avg RUU occupancy (insn's)
ruu_rate                     1.9401 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9265 # avg RUU occupant latency (cycle's)
ruu_full                     0.8292 # fraction of time (cycle's) RUU was full
LSQ_count                  31830906 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6965 # avg LSQ occupancy (insn's)
lsq_rate                     1.9401 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4208 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153108257 # total number of slip cycles
avg_sim_slip                11.6813 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:15:25 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264617 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6306917 # total simulation time in cycles
sim_IPC                      1.8363 # instructions per cycle
sim_CPI                      0.5446 # cycles per instruction
sim_exec_BW                  1.9446 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18554051 # cumulative IFQ occupancy
IFQ_fcount                  3810353 # cumulative IFQ full count
ifq_occupancy                2.9419 # avg IFQ occupancy (insn's)
ifq_rate                     1.9446 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5128 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6042 # fraction of time (cycle's) IFQ was full
RUU_count                  76461838 # cumulative RUU occupancy
RUU_fcount                  3208449 # cumulative RUU full count
ruu_occupancy               12.1235 # avg RUU occupancy (insn's)
ruu_rate                     1.9446 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2343 # avg RUU occupant latency (cycle's)
ruu_full                     0.5087 # fraction of time (cycle's) RUU was full
LSQ_count                  31945000 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0651 # avg LSQ occupancy (insn's)
lsq_rate                     1.9446 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6046 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122473934 # total number of slip cycles
avg_sim_slip                10.5750 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:15:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374948 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   8955527 # total simulation time in cycles
sim_IPC                      1.4875 # instructions per cycle
sim_CPI                      0.6723 # cycles per instruction
sim_exec_BW                  1.4935 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34912176 # cumulative IFQ occupancy
IFQ_fcount                  8577399 # cumulative IFQ full count
ifq_occupancy                3.8984 # avg IFQ occupancy (insn's)
ifq_rate                     1.4935 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6103 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9578 # fraction of time (cycle's) IFQ was full
RUU_count                 140556261 # cumulative RUU occupancy
RUU_fcount                  8436521 # cumulative RUU full count
ruu_occupancy               15.6949 # avg RUU occupancy (insn's)
ruu_rate                     1.4935 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.5089 # avg RUU occupant latency (cycle's)
ruu_full                     0.9420 # fraction of time (cycle's) RUU was full
LSQ_count                  73414149 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1976 # avg LSQ occupancy (insn's)
lsq_rate                     1.4935 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.4889 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  233686176 # total number of slip cycles
avg_sim_slip                17.5428 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:15:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12528998 # total simulation time in cycles
sim_IPC                      1.7064 # instructions per cycle
sim_CPI                      0.5860 # cycles per instruction
sim_exec_BW                  1.7120 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48980966 # cumulative IFQ occupancy
IFQ_fcount                 11626069 # cumulative IFQ full count
ifq_occupancy                3.9094 # avg IFQ occupancy (insn's)
ifq_rate                     1.7120 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2835 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199059413 # cumulative RUU occupancy
RUU_fcount                 12405944 # cumulative RUU full count
ruu_occupancy               15.8879 # avg RUU occupancy (insn's)
ruu_rate                     1.7120 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2801 # avg RUU occupant latency (cycle's)
ruu_full                     0.9902 # fraction of time (cycle's) RUU was full
LSQ_count                  63820211 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0938 # avg LSQ occupancy (insn's)
lsq_rate                     1.7120 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9753 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290662557 # total number of slip cycles
avg_sim_slip                13.5955 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:15:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861075 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  36618705 # total simulation time in cycles
sim_IPC                      0.7608 # instructions per cycle
sim_CPI                      1.3144 # cycles per instruction
sim_exec_BW                  0.7608 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146417225 # cumulative IFQ occupancy
IFQ_fcount                 36604066 # cumulative IFQ full count
ifq_occupancy                3.9984 # avg IFQ occupancy (insn's)
ifq_rate                     0.7608 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2553 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 585670688 # cumulative RUU occupancy
RUU_fcount                 36603374 # cumulative RUU full count
ruu_occupancy               15.9938 # avg RUU occupancy (insn's)
ruu_rate                     0.7608 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.0211 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 178050607 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8623 # avg LSQ occupancy (insn's)
lsq_rate                     0.7608 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3907 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  800230042 # total number of slip cycles
avg_sim_slip                28.7236 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:16:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6842346 # total simulation time in cycles
sim_IPC                      1.9156 # instructions per cycle
sim_CPI                      0.5220 # cycles per instruction
sim_exec_BW                  1.9217 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25473232 # cumulative IFQ occupancy
IFQ_fcount                  6242705 # cumulative IFQ full count
ifq_occupancy                3.7229 # avg IFQ occupancy (insn's)
ifq_rate                     1.9217 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9373 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105172535 # cumulative RUU occupancy
RUU_fcount                  5678866 # cumulative RUU full count
ruu_occupancy               15.3708 # avg RUU occupancy (insn's)
ruu_rate                     1.9217 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9985 # avg RUU occupant latency (cycle's)
ruu_full                     0.8300 # fraction of time (cycle's) RUU was full
LSQ_count                  32147178 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6983 # avg LSQ occupancy (insn's)
lsq_rate                     1.9217 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4448 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154370573 # total number of slip cycles
avg_sim_slip                11.7776 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:16:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264749 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6396743 # total simulation time in cycles
sim_IPC                      1.8105 # instructions per cycle
sim_CPI                      0.5523 # cycles per instruction
sim_exec_BW                  1.9173 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18885239 # cumulative IFQ occupancy
IFQ_fcount                  3893150 # cumulative IFQ full count
ifq_occupancy                2.9523 # avg IFQ occupancy (insn's)
ifq_rate                     1.9173 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5398 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6086 # fraction of time (cycle's) IFQ was full
RUU_count                  77787283 # cumulative RUU occupancy
RUU_fcount                  3291213 # cumulative RUU full count
ruu_occupancy               12.1605 # avg RUU occupancy (insn's)
ruu_rate                     1.9173 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3423 # avg RUU occupant latency (cycle's)
ruu_full                     0.5145 # fraction of time (cycle's) RUU was full
LSQ_count                  32243881 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0407 # avg LSQ occupancy (insn's)
lsq_rate                     1.9173 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6290 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124098260 # total number of slip cycles
avg_sim_slip                10.7152 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:16:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375212 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9574968 # total simulation time in cycles
sim_IPC                      1.3912 # instructions per cycle
sim_CPI                      0.7188 # cycles per instruction
sim_exec_BW                  1.3969 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37295687 # cumulative IFQ occupancy
IFQ_fcount                  9173277 # cumulative IFQ full count
ifq_occupancy                3.8951 # avg IFQ occupancy (insn's)
ifq_rate                     1.3969 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7884 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 150098075 # cumulative RUU occupancy
RUU_fcount                  9032334 # cumulative RUU full count
ruu_occupancy               15.6761 # avg RUU occupancy (insn's)
ruu_rate                     1.3969 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.2221 # avg RUU occupant latency (cycle's)
ruu_full                     0.9433 # fraction of time (cycle's) RUU was full
LSQ_count                  79092692 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2604 # avg LSQ occupancy (insn's)
lsq_rate                     1.3969 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.9134 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  248905873 # total number of slip cycles
avg_sim_slip                18.6854 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:16:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12603116 # total simulation time in cycles
sim_IPC                      1.6963 # instructions per cycle
sim_CPI                      0.5895 # cycles per instruction
sim_exec_BW                  1.7020 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49254470 # cumulative IFQ occupancy
IFQ_fcount                 11694445 # cumulative IFQ full count
ifq_occupancy                3.9081 # avg IFQ occupancy (insn's)
ifq_rate                     1.7020 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2962 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200155046 # cumulative RUU occupancy
RUU_fcount                 12474320 # cumulative RUU full count
ruu_occupancy               15.8814 # avg RUU occupancy (insn's)
ruu_rate                     1.7020 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3312 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  64163807 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0911 # avg LSQ occupancy (insn's)
lsq_rate                     1.7020 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9913 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292101588 # total number of slip cycles
avg_sim_slip                13.6629 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:17:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861339 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38111262 # total simulation time in cycles
sim_IPC                      0.7310 # instructions per cycle
sim_CPI                      1.3680 # cycles per instruction
sim_exec_BW                  0.7311 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 152359865 # cumulative IFQ occupancy
IFQ_fcount                 38089726 # cumulative IFQ full count
ifq_occupancy                3.9978 # avg IFQ occupancy (insn's)
ifq_rate                     0.7311 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4685 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 609442667 # cumulative RUU occupancy
RUU_fcount                 38088968 # cumulative RUU full count
ruu_occupancy               15.9911 # avg RUU occupancy (insn's)
ruu_rate                     0.7311 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.8741 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 185233090 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8603 # avg LSQ occupancy (insn's)
lsq_rate                     0.7311 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6484 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  831184372 # total number of slip cycles
avg_sim_slip                29.8347 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:17:25 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6818790 # total simulation time in cycles
sim_IPC                      1.9222 # instructions per cycle
sim_CPI                      0.5202 # cycles per instruction
sim_exec_BW                  1.9283 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25387408 # cumulative IFQ occupancy
IFQ_fcount                  6221249 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9283 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9307 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104828303 # cumulative RUU occupancy
RUU_fcount                  5657410 # cumulative RUU full count
ruu_occupancy               15.3734 # avg RUU occupancy (insn's)
ruu_rate                     1.9283 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9723 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32032170 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6976 # avg LSQ occupancy (insn's)
lsq_rate                     1.9283 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4361 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153911549 # total number of slip cycles
avg_sim_slip                11.7426 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:17:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264701 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6364079 # total simulation time in cycles
sim_IPC                      1.8198 # instructions per cycle
sim_CPI                      0.5495 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18764807 # cumulative IFQ occupancy
IFQ_fcount                  3863042 # cumulative IFQ full count
ifq_occupancy                2.9486 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5300 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6070 # fraction of time (cycle's) IFQ was full
RUU_count                  77305303 # cumulative RUU occupancy
RUU_fcount                  3261117 # cumulative RUU full count
ruu_occupancy               12.1471 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3031 # avg RUU occupant latency (cycle's)
ruu_full                     0.5124 # fraction of time (cycle's) RUU was full
LSQ_count                  32135197 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0495 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6201 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123507596 # total number of slip cycles
avg_sim_slip                10.6642 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:17:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 10 # total simulation time in seconds
sim_inst_rate          1332090.8000 # simulation speed (in insts/sec)
sim_total_insn             13375116 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9349714 # total simulation time in cycles
sim_IPC                      1.4247 # instructions per cycle
sim_CPI                      0.7019 # cycles per instruction
sim_exec_BW                  1.4305 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36428943 # cumulative IFQ occupancy
IFQ_fcount                  8956591 # cumulative IFQ full count
ifq_occupancy                3.8963 # avg IFQ occupancy (insn's)
ifq_rate                     1.4305 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7236 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 146628252 # cumulative RUU occupancy
RUU_fcount                  8815672 # cumulative RUU full count
ruu_occupancy               15.6826 # avg RUU occupancy (insn's)
ruu_rate                     1.4305 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.9628 # avg RUU occupant latency (cycle's)
ruu_full                     0.9429 # fraction of time (cycle's) RUU was full
LSQ_count                  77027739 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2385 # avg LSQ occupancy (insn's)
lsq_rate                     1.4305 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7590 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  243371337 # total number of slip cycles
avg_sim_slip                18.2699 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:17:51 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12576164 # total simulation time in cycles
sim_IPC                      1.7000 # instructions per cycle
sim_CPI                      0.5882 # cycles per instruction
sim_exec_BW                  1.7056 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49155014 # cumulative IFQ occupancy
IFQ_fcount                 11669581 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7056 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2916 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199756634 # cumulative RUU occupancy
RUU_fcount                 12449456 # cumulative RUU full count
ruu_occupancy               15.8837 # avg RUU occupancy (insn's)
ruu_rate                     1.7056 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3126 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64038863 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0921 # avg LSQ occupancy (insn's)
lsq_rate                     1.7056 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9855 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291578304 # total number of slip cycles
avg_sim_slip                13.6384 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:18:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861243 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37568514 # total simulation time in cycles
sim_IPC                      0.7416 # instructions per cycle
sim_CPI                      1.3485 # cycles per instruction
sim_exec_BW                  0.7416 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150198905 # cumulative IFQ occupancy
IFQ_fcount                 37549486 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7416 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3910 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 600798311 # cumulative RUU occupancy
RUU_fcount                 37548752 # cumulative RUU full count
ruu_occupancy               15.9921 # avg RUU occupancy (insn's)
ruu_rate                     0.7416 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5639 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182621278 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8610 # avg LSQ occupancy (insn's)
lsq_rate                     0.7416 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5547 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  819928252 # total number of slip cycles
avg_sim_slip                29.4306 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:18:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6942459 # total simulation time in cycles
sim_IPC                      1.8880 # instructions per cycle
sim_CPI                      0.5297 # cycles per instruction
sim_exec_BW                  1.8940 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25837984 # cumulative IFQ occupancy
IFQ_fcount                  6333893 # cumulative IFQ full count
ifq_occupancy                3.7217 # avg IFQ occupancy (insn's)
ifq_rate                     1.8940 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9650 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106635521 # cumulative RUU occupancy
RUU_fcount                  5770054 # cumulative RUU full count
ruu_occupancy               15.3599 # avg RUU occupancy (insn's)
ruu_rate                     1.8940 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.1098 # avg RUU occupant latency (cycle's)
ruu_full                     0.8311 # fraction of time (cycle's) RUU was full
LSQ_count                  32635962 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7009 # avg LSQ occupancy (insn's)
lsq_rate                     1.8940 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4820 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156321425 # total number of slip cycles
avg_sim_slip                11.9264 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:18:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264954 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6535600 # total simulation time in cycles
sim_IPC                      1.7721 # instructions per cycle
sim_CPI                      0.5643 # cycles per instruction
sim_exec_BW                  1.8766 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19397213 # cumulative IFQ occupancy
IFQ_fcount                  4021144 # cumulative IFQ full count
ifq_occupancy                2.9679 # avg IFQ occupancy (insn's)
ifq_rate                     1.8766 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5815 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6153 # fraction of time (cycle's) IFQ was full
RUU_count                  79836389 # cumulative RUU occupancy
RUU_fcount                  3419160 # cumulative RUU full count
ruu_occupancy               12.2156 # avg RUU occupancy (insn's)
ruu_rate                     1.8766 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5093 # avg RUU occupant latency (cycle's)
ruu_full                     0.5232 # fraction of time (cycle's) RUU was full
LSQ_count                  32706132 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0043 # avg LSQ occupancy (insn's)
lsq_rate                     1.8766 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6666 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126609630 # total number of slip cycles
avg_sim_slip                10.9320 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917910 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:18:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375617 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10532381 # total simulation time in cycles
sim_IPC                      1.2648 # instructions per cycle
sim_CPI                      0.7907 # cycles per instruction
sim_exec_BW                  1.2700 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40979678 # cumulative IFQ occupancy
IFQ_fcount                 10094275 # cumulative IFQ full count
ifq_occupancy                3.8908 # avg IFQ occupancy (insn's)
ifq_rate                     1.2700 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0638 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 164845908 # cumulative RUU occupancy
RUU_fcount                  9953233 # cumulative RUU full count
ruu_occupancy               15.6513 # avg RUU occupancy (insn's)
ruu_rate                     1.2700 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.3244 # avg RUU occupant latency (cycle's)
ruu_full                     0.9450 # fraction of time (cycle's) RUU was full
LSQ_count                  87869122 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3428 # avg LSQ occupancy (insn's)
lsq_rate                     1.2700 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5694 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  272429127 # total number of slip cycles
avg_sim_slip                20.4512 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:18:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12717662 # total simulation time in cycles
sim_IPC                      1.6811 # instructions per cycle
sim_CPI                      0.5949 # cycles per instruction
sim_exec_BW                  1.6866 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49677158 # cumulative IFQ occupancy
IFQ_fcount                 11800117 # cumulative IFQ full count
ifq_occupancy                3.9062 # avg IFQ occupancy (insn's)
ifq_rate                     1.6866 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3159 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201848297 # cumulative RUU occupancy
RUU_fcount                 12579992 # cumulative RUU full count
ruu_occupancy               15.8715 # avg RUU occupancy (insn's)
ruu_rate                     1.6866 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4101 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64694819 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0870 # avg LSQ occupancy (insn's)
lsq_rate                     1.6866 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0161 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294325545 # total number of slip cycles
avg_sim_slip                13.7669 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:19:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861747 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40417941 # total simulation time in cycles
sim_IPC                      0.6893 # instructions per cycle
sim_CPI                      1.4508 # cycles per instruction
sim_exec_BW                  0.6893 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 161543945 # cumulative IFQ occupancy
IFQ_fcount                 40385746 # cumulative IFQ full count
ifq_occupancy                3.9968 # avg IFQ occupancy (insn's)
ifq_rate                     0.6893 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7981 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 646181180 # cumulative RUU occupancy
RUU_fcount                 40384886 # cumulative RUU full count
ruu_occupancy               15.9875 # avg RUU occupancy (insn's)
ruu_rate                     0.6893 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.1924 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 196333291 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8576 # avg LSQ occupancy (insn's)
lsq_rate                     0.6893 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.0467 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  879022882 # total number of slip cycles
avg_sim_slip                31.5518 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:19:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6801123 # total simulation time in cycles
sim_IPC                      1.9272 # instructions per cycle
sim_CPI                      0.5189 # cycles per instruction
sim_exec_BW                  1.9334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25323040 # cumulative IFQ occupancy
IFQ_fcount                  6205157 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104570129 # cumulative RUU occupancy
RUU_fcount                  5641318 # cumulative RUU full count
ruu_occupancy               15.3754 # avg RUU occupancy (insn's)
ruu_rate                     1.9334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9527 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31945914 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6972 # avg LSQ occupancy (insn's)
lsq_rate                     1.9334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4295 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153567281 # total number of slip cycles
avg_sim_slip                11.7163 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:19:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264665 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6339581 # total simulation time in cycles
sim_IPC                      1.8269 # instructions per cycle
sim_CPI                      0.5474 # cycles per instruction
sim_exec_BW                  1.9346 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18674483 # cumulative IFQ occupancy
IFQ_fcount                  3840461 # cumulative IFQ full count
ifq_occupancy                2.9457 # avg IFQ occupancy (insn's)
ifq_rate                     1.9346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5226 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6058 # fraction of time (cycle's) IFQ was full
RUU_count                  76943818 # cumulative RUU occupancy
RUU_fcount                  3238545 # cumulative RUU full count
ruu_occupancy               12.1371 # avg RUU occupancy (insn's)
ruu_rate                     1.9346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2736 # avg RUU occupant latency (cycle's)
ruu_full                     0.5108 # fraction of time (cycle's) RUU was full
LSQ_count                  32053684 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0561 # avg LSQ occupancy (insn's)
lsq_rate                     1.9346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6135 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123064598 # total number of slip cycles
avg_sim_slip                10.6260 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:19:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375044 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9180775 # total simulation time in cycles
sim_IPC                      1.4510 # instructions per cycle
sim_CPI                      0.6892 # cycles per instruction
sim_exec_BW                  1.4569 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35778891 # cumulative IFQ occupancy
IFQ_fcount                  8794078 # cumulative IFQ full count
ifq_occupancy                3.8972 # avg IFQ occupancy (insn's)
ifq_rate                     1.4569 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6750 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144025965 # cumulative RUU occupancy
RUU_fcount                  8653177 # cumulative RUU full count
ruu_occupancy               15.6878 # avg RUU occupancy (insn's)
ruu_rate                     1.4569 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7683 # avg RUU occupant latency (cycle's)
ruu_full                     0.9425 # fraction of time (cycle's) RUU was full
LSQ_count                  75479055 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2214 # avg LSQ occupancy (insn's)
lsq_rate                     1.4569 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6433 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  239220546 # total number of slip cycles
avg_sim_slip                17.9583 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:20:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12555950 # total simulation time in cycles
sim_IPC                      1.7027 # instructions per cycle
sim_CPI                      0.5873 # cycles per instruction
sim_exec_BW                  1.7084 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49080422 # cumulative IFQ occupancy
IFQ_fcount                 11650933 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7084 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2881 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199457825 # cumulative RUU occupancy
RUU_fcount                 12430808 # cumulative RUU full count
ruu_occupancy               15.8855 # avg RUU occupancy (insn's)
ruu_rate                     1.7084 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2987 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63945155 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0928 # avg LSQ occupancy (insn's)
lsq_rate                     1.7084 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9811 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291185841 # total number of slip cycles
avg_sim_slip                13.6200 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 72 3 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:20:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861171 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37161453 # total simulation time in cycles
sim_IPC                      0.7497 # instructions per cycle
sim_CPI                      1.3339 # cycles per instruction
sim_exec_BW                  0.7497 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148578185 # cumulative IFQ occupancy
IFQ_fcount                 37144306 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7497 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3328 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 594315044 # cumulative RUU occupancy
RUU_fcount                 37143590 # cumulative RUU full count
ruu_occupancy               15.9928 # avg RUU occupancy (insn's)
ruu_rate                     0.7497 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.3313 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180662419 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8616 # avg LSQ occupancy (insn's)
lsq_rate                     0.7497 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4844 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  811486162 # total number of slip cycles
avg_sim_slip                29.1276 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:20:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6836457 # total simulation time in cycles
sim_IPC                      1.9172 # instructions per cycle
sim_CPI                      0.5216 # cycles per instruction
sim_exec_BW                  1.9234 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25451776 # cumulative IFQ occupancy
IFQ_fcount                  6237341 # cumulative IFQ full count
ifq_occupancy                3.7229 # avg IFQ occupancy (insn's)
ifq_rate                     1.9234 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9356 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105086477 # cumulative RUU occupancy
RUU_fcount                  5673502 # cumulative RUU full count
ruu_occupancy               15.3715 # avg RUU occupancy (insn's)
ruu_rate                     1.9234 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9920 # avg RUU occupant latency (cycle's)
ruu_full                     0.8299 # fraction of time (cycle's) RUU was full
LSQ_count                  32118426 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6981 # avg LSQ occupancy (insn's)
lsq_rate                     1.9234 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4427 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154255817 # total number of slip cycles
avg_sim_slip                11.7689 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:20:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264737 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6388577 # total simulation time in cycles
sim_IPC                      1.8128 # instructions per cycle
sim_CPI                      0.5516 # cycles per instruction
sim_exec_BW                  1.9198 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18855131 # cumulative IFQ occupancy
IFQ_fcount                  3885623 # cumulative IFQ full count
ifq_occupancy                2.9514 # avg IFQ occupancy (insn's)
ifq_rate                     1.9198 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5373 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6082 # fraction of time (cycle's) IFQ was full
RUU_count                  77666788 # cumulative RUU occupancy
RUU_fcount                  3283689 # cumulative RUU full count
ruu_occupancy               12.1571 # avg RUU occupancy (insn's)
ruu_rate                     1.9198 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3325 # avg RUU occupant latency (cycle's)
ruu_full                     0.5140 # fraction of time (cycle's) RUU was full
LSQ_count                  32216710 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0429 # avg LSQ occupancy (insn's)
lsq_rate                     1.9198 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6268 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123950594 # total number of slip cycles
avg_sim_slip                10.7025 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:20:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375188 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9518653 # total simulation time in cycles
sim_IPC                      1.3995 # instructions per cycle
sim_CPI                      0.7146 # cycles per instruction
sim_exec_BW                  1.4052 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37078995 # cumulative IFQ occupancy
IFQ_fcount                  9119104 # cumulative IFQ full count
ifq_occupancy                3.8954 # avg IFQ occupancy (insn's)
ifq_rate                     1.4052 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7722 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 149230569 # cumulative RUU occupancy
RUU_fcount                  8978167 # cumulative RUU full count
ruu_occupancy               15.6777 # avg RUU occupancy (insn's)
ruu_rate                     1.4052 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.1573 # avg RUU occupant latency (cycle's)
ruu_full                     0.9432 # fraction of time (cycle's) RUU was full
LSQ_count                  78576435 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2550 # avg LSQ occupancy (insn's)
lsq_rate                     1.4052 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8748 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  247522170 # total number of slip cycles
avg_sim_slip                18.5815 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:21:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 16 # total simulation time in seconds
sim_inst_rate          1336203.5000 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12596378 # total simulation time in cycles
sim_IPC                      1.6973 # instructions per cycle
sim_CPI                      0.5892 # cycles per instruction
sim_exec_BW                  1.7029 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49229606 # cumulative IFQ occupancy
IFQ_fcount                 11688229 # cumulative IFQ full count
ifq_occupancy                3.9082 # avg IFQ occupancy (insn's)
ifq_rate                     1.7029 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2951 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200055443 # cumulative RUU occupancy
RUU_fcount                 12468104 # cumulative RUU full count
ruu_occupancy               15.8820 # avg RUU occupancy (insn's)
ruu_rate                     1.7029 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3265 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  64132571 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0914 # avg LSQ occupancy (insn's)
lsq_rate                     1.7029 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9898 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291970767 # total number of slip cycles
avg_sim_slip                13.6567 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:21:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27861315 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37975575 # total simulation time in cycles
sim_IPC                      0.7336 # instructions per cycle
sim_CPI                      1.3631 # cycles per instruction
sim_exec_BW                  0.7337 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151819625 # cumulative IFQ occupancy
IFQ_fcount                 37954666 # cumulative IFQ full count
ifq_occupancy                3.9978 # avg IFQ occupancy (insn's)
ifq_rate                     0.7337 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4491 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 607281578 # cumulative RUU occupancy
RUU_fcount                 37953914 # cumulative RUU full count
ruu_occupancy               15.9914 # avg RUU occupancy (insn's)
ruu_rate                     0.7337 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.7966 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 184580137 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8605 # avg LSQ occupancy (insn's)
lsq_rate                     0.7337 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6250 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  828370342 # total number of slip cycles
avg_sim_slip                29.7337 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:21:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6820753 # total simulation time in cycles
sim_IPC                      1.9217 # instructions per cycle
sim_CPI                      0.5204 # cycles per instruction
sim_exec_BW                  1.9278 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25394560 # cumulative IFQ occupancy
IFQ_fcount                  6223037 # cumulative IFQ full count
ifq_occupancy                3.7231 # avg IFQ occupancy (insn's)
ifq_rate                     1.9278 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9313 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104856989 # cumulative RUU occupancy
RUU_fcount                  5659198 # cumulative RUU full count
ruu_occupancy               15.3732 # avg RUU occupancy (insn's)
ruu_rate                     1.9278 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9745 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32041754 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6977 # avg LSQ occupancy (insn's)
lsq_rate                     1.9278 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4368 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153949801 # total number of slip cycles
avg_sim_slip                11.7455 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:21:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264705 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6366801 # total simulation time in cycles
sim_IPC                      1.8190 # instructions per cycle
sim_CPI                      0.5497 # cycles per instruction
sim_exec_BW                  1.9264 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18774843 # cumulative IFQ occupancy
IFQ_fcount                  3865551 # cumulative IFQ full count
ifq_occupancy                2.9489 # avg IFQ occupancy (insn's)
ifq_rate                     1.9264 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5308 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6071 # fraction of time (cycle's) IFQ was full
RUU_count                  77345468 # cumulative RUU occupancy
RUU_fcount                  3263625 # cumulative RUU full count
ruu_occupancy               12.1482 # avg RUU occupancy (insn's)
ruu_rate                     1.9264 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3063 # avg RUU occupant latency (cycle's)
ruu_full                     0.5126 # fraction of time (cycle's) RUU was full
LSQ_count                  32144254 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0487 # avg LSQ occupancy (insn's)
lsq_rate                     1.9264 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6209 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123556818 # total number of slip cycles
avg_sim_slip                10.6685 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:22:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375124 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9368485 # total simulation time in cycles
sim_IPC                      1.4219 # instructions per cycle
sim_CPI                      0.7033 # cycles per instruction
sim_exec_BW                  1.4277 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36501171 # cumulative IFQ occupancy
IFQ_fcount                  8974648 # cumulative IFQ full count
ifq_occupancy                3.8962 # avg IFQ occupancy (insn's)
ifq_rate                     1.4277 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7290 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 146917395 # cumulative RUU occupancy
RUU_fcount                  8833727 # cumulative RUU full count
ruu_occupancy               15.6821 # avg RUU occupancy (insn's)
ruu_rate                     1.4277 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.9844 # avg RUU occupant latency (cycle's)
ruu_full                     0.9429 # fraction of time (cycle's) RUU was full
LSQ_count                  77199815 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2404 # avg LSQ occupancy (insn's)
lsq_rate                     1.4277 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7719 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  243832536 # total number of slip cycles
avg_sim_slip                18.3045 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:22:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12578410 # total simulation time in cycles
sim_IPC                      1.6997 # instructions per cycle
sim_CPI                      0.5883 # cycles per instruction
sim_exec_BW                  1.7053 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49163302 # cumulative IFQ occupancy
IFQ_fcount                 11671653 # cumulative IFQ full count
ifq_occupancy                3.9085 # avg IFQ occupancy (insn's)
ifq_rate                     1.7053 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2920 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199789835 # cumulative RUU occupancy
RUU_fcount                 12451528 # cumulative RUU full count
ruu_occupancy               15.8836 # avg RUU occupancy (insn's)
ruu_rate                     1.7053 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3142 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64049275 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0920 # avg LSQ occupancy (insn's)
lsq_rate                     1.7053 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9860 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291621911 # total number of slip cycles
avg_sim_slip                13.6404 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:22:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861251 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37613743 # total simulation time in cycles
sim_IPC                      0.7407 # instructions per cycle
sim_CPI                      1.3501 # cycles per instruction
sim_exec_BW                  0.7407 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150378985 # cumulative IFQ occupancy
IFQ_fcount                 37594506 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7407 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3974 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 601518674 # cumulative RUU occupancy
RUU_fcount                 37593770 # cumulative RUU full count
ruu_occupancy               15.9920 # avg RUU occupancy (insn's)
ruu_rate                     0.7407 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5898 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182838929 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8610 # avg LSQ occupancy (insn's)
lsq_rate                     0.7407 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5625 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  820866262 # total number of slip cycles
avg_sim_slip                29.4643 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:22:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6958163 # total simulation time in cycles
sim_IPC                      1.8837 # instructions per cycle
sim_CPI                      0.5309 # cycles per instruction
sim_exec_BW                  1.8897 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25895200 # cumulative IFQ occupancy
IFQ_fcount                  6348197 # cumulative IFQ full count
ifq_occupancy                3.7216 # avg IFQ occupancy (insn's)
ifq_rate                     1.8897 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9694 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106865009 # cumulative RUU occupancy
RUU_fcount                  5784358 # cumulative RUU full count
ruu_occupancy               15.3582 # avg RUU occupancy (insn's)
ruu_rate                     1.8897 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.1272 # avg RUU occupant latency (cycle's)
ruu_full                     0.8313 # fraction of time (cycle's) RUU was full
LSQ_count                  32712634 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7013 # avg LSQ occupancy (insn's)
lsq_rate                     1.8897 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4878 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156627441 # total number of slip cycles
avg_sim_slip                11.9498 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:22:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264986 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6557392 # total simulation time in cycles
sim_IPC                      1.7662 # instructions per cycle
sim_CPI                      0.5662 # cycles per instruction
sim_exec_BW                  1.8704 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19477565 # cumulative IFQ occupancy
IFQ_fcount                  4041232 # cumulative IFQ full count
ifq_occupancy                2.9703 # avg IFQ occupancy (insn's)
ifq_rate                     1.8704 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5881 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6163 # fraction of time (cycle's) IFQ was full
RUU_count                  80157965 # cumulative RUU occupancy
RUU_fcount                  3439240 # cumulative RUU full count
ruu_occupancy               12.2241 # avg RUU occupancy (insn's)
ruu_rate                     1.8704 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5355 # avg RUU occupant latency (cycle's)
ruu_full                     0.5245 # fraction of time (cycle's) RUU was full
LSQ_count                  32778756 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.9987 # avg LSQ occupancy (insn's)
lsq_rate                     1.8704 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6725 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  127003830 # total number of slip cycles
avg_sim_slip                10.9661 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917910 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:23:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375681 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10682565 # total simulation time in cycles
sim_IPC                      1.2470 # instructions per cycle
sim_CPI                      0.8019 # cycles per instruction
sim_exec_BW                  1.2521 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  41557566 # cumulative IFQ occupancy
IFQ_fcount                 10238747 # cumulative IFQ full count
ifq_occupancy                3.8902 # avg IFQ occupancy (insn's)
ifq_rate                     1.2521 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.1069 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9585 # fraction of time (cycle's) IFQ was full
RUU_count                 167159308 # cumulative RUU occupancy
RUU_fcount                 10097689 # cumulative RUU full count
ruu_occupancy               15.6479 # avg RUU occupancy (insn's)
ruu_rate                     1.2521 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.4973 # avg RUU occupant latency (cycle's)
ruu_full                     0.9452 # fraction of time (cycle's) RUU was full
LSQ_count                  89245824 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3543 # avg LSQ occupancy (insn's)
lsq_rate                     1.2521 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6722 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  276119069 # total number of slip cycles
avg_sim_slip                20.7282 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:23:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12735630 # total simulation time in cycles
sim_IPC                      1.6787 # instructions per cycle
sim_CPI                      0.5957 # cycles per instruction
sim_exec_BW                  1.6843 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49743462 # cumulative IFQ occupancy
IFQ_fcount                 11816693 # cumulative IFQ full count
ifq_occupancy                3.9059 # avg IFQ occupancy (insn's)
ifq_rate                     1.6843 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3190 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 202113905 # cumulative RUU occupancy
RUU_fcount                 12596568 # cumulative RUU full count
ruu_occupancy               15.8700 # avg RUU occupancy (insn's)
ruu_rate                     1.6843 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4225 # avg RUU occupant latency (cycle's)
ruu_full                     0.9891 # fraction of time (cycle's) RUU was full
LSQ_count                  64778115 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0864 # avg LSQ occupancy (insn's)
lsq_rate                     1.6843 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0199 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294674401 # total number of slip cycles
avg_sim_slip                13.7832 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:23:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27861811 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40779773 # total simulation time in cycles
sim_IPC                      0.6832 # instructions per cycle
sim_CPI                      1.4638 # cycles per instruction
sim_exec_BW                  0.6832 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 162984585 # cumulative IFQ occupancy
IFQ_fcount                 40745906 # cumulative IFQ full count
ifq_occupancy                3.9967 # avg IFQ occupancy (insn's)
ifq_rate                     0.6832 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.8497 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 651944084 # cumulative RUU occupancy
RUU_fcount                 40745030 # cumulative RUU full count
ruu_occupancy               15.9869 # avg RUU occupancy (insn's)
ruu_rate                     0.6832 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.3992 # avg RUU occupant latency (cycle's)
ruu_full                     0.9991 # fraction of time (cycle's) RUU was full
LSQ_count                 198074499 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8572 # avg LSQ occupancy (insn's)
lsq_rate                     0.6832 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.1092 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  886526962 # total number of slip cycles
avg_sim_slip                31.8211 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:23:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6808975 # total simulation time in cycles
sim_IPC                      1.9250 # instructions per cycle
sim_CPI                      0.5195 # cycles per instruction
sim_exec_BW                  1.9311 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25351648 # cumulative IFQ occupancy
IFQ_fcount                  6212309 # cumulative IFQ full count
ifq_occupancy                3.7233 # avg IFQ occupancy (insn's)
ifq_rate                     1.9311 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9280 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104684873 # cumulative RUU occupancy
RUU_fcount                  5648470 # cumulative RUU full count
ruu_occupancy               15.3745 # avg RUU occupancy (insn's)
ruu_rate                     1.9311 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9614 # avg RUU occupant latency (cycle's)
ruu_full                     0.8296 # fraction of time (cycle's) RUU was full
LSQ_count                  31984250 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6974 # avg LSQ occupancy (insn's)
lsq_rate                     1.9311 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4324 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153720289 # total number of slip cycles
avg_sim_slip                11.7280 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:24:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264681 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6350469 # total simulation time in cycles
sim_IPC                      1.8237 # instructions per cycle
sim_CPI                      0.5483 # cycles per instruction
sim_exec_BW                  1.9313 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18714627 # cumulative IFQ occupancy
IFQ_fcount                  3850497 # cumulative IFQ full count
ifq_occupancy                2.9470 # avg IFQ occupancy (insn's)
ifq_rate                     1.9313 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6063 # fraction of time (cycle's) IFQ was full
RUU_count                  77104478 # cumulative RUU occupancy
RUU_fcount                  3248577 # cumulative RUU full count
ruu_occupancy               12.1415 # avg RUU occupancy (insn's)
ruu_rate                     1.9313 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2867 # avg RUU occupant latency (cycle's)
ruu_full                     0.5115 # fraction of time (cycle's) RUU was full
LSQ_count                  32089912 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0532 # avg LSQ occupancy (insn's)
lsq_rate                     1.9313 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6164 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123261486 # total number of slip cycles
avg_sim_slip                10.6430 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:24:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375076 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9255859 # total simulation time in cycles
sim_IPC                      1.4392 # instructions per cycle
sim_CPI                      0.6948 # cycles per instruction
sim_exec_BW                  1.4450 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36067803 # cumulative IFQ occupancy
IFQ_fcount                  8866306 # cumulative IFQ full count
ifq_occupancy                3.8968 # avg IFQ occupancy (insn's)
ifq_rate                     1.4450 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 145182537 # cumulative RUU occupancy
RUU_fcount                  8725397 # cumulative RUU full count
ruu_occupancy               15.6855 # avg RUU occupancy (insn's)
ruu_rate                     1.4450 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8547 # avg RUU occupant latency (cycle's)
ruu_full                     0.9427 # fraction of time (cycle's) RUU was full
LSQ_count                  76167359 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2291 # avg LSQ occupancy (insn's)
lsq_rate                     1.4450 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6947 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  241065342 # total number of slip cycles
avg_sim_slip                18.0968 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:24:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12564934 # total simulation time in cycles
sim_IPC                      1.7015 # instructions per cycle
sim_CPI                      0.5877 # cycles per instruction
sim_exec_BW                  1.7071 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49113574 # cumulative IFQ occupancy
IFQ_fcount                 11659221 # cumulative IFQ full count
ifq_occupancy                3.9088 # avg IFQ occupancy (insn's)
ifq_rate                     1.7071 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2897 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199590629 # cumulative RUU occupancy
RUU_fcount                 12439096 # cumulative RUU full count
ruu_occupancy               15.8847 # avg RUU occupancy (insn's)
ruu_rate                     1.7071 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3049 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63986803 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0925 # avg LSQ occupancy (insn's)
lsq_rate                     1.7071 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9831 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291360269 # total number of slip cycles
avg_sim_slip                13.6282 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 76 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:24:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         76 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861203 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37342369 # total simulation time in cycles
sim_IPC                      0.7461 # instructions per cycle
sim_CPI                      1.3404 # cycles per instruction
sim_exec_BW                  0.7461 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149298505 # cumulative IFQ occupancy
IFQ_fcount                 37324386 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7461 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3587 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 597196496 # cumulative RUU occupancy
RUU_fcount                 37323662 # cumulative RUU full count
ruu_occupancy               15.9925 # avg RUU occupancy (insn's)
ruu_rate                     0.7461 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4347 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181533023 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8613 # avg LSQ occupancy (insn's)
lsq_rate                     0.7461 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5156 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  815238202 # total number of slip cycles
avg_sim_slip                29.2623 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:25:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6822716 # total simulation time in cycles
sim_IPC                      1.9211 # instructions per cycle
sim_CPI                      0.5205 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25401712 # cumulative IFQ occupancy
IFQ_fcount                  6224825 # cumulative IFQ full count
ifq_occupancy                3.7231 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9318 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104885675 # cumulative RUU occupancy
RUU_fcount                  5660986 # cumulative RUU full count
ruu_occupancy               15.3730 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9767 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32051338 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6977 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4376 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153988053 # total number of slip cycles
avg_sim_slip                11.7484 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:25:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264709 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6369523 # total simulation time in cycles
sim_IPC                      1.8183 # instructions per cycle
sim_CPI                      0.5500 # cycles per instruction
sim_exec_BW                  1.9255 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18784879 # cumulative IFQ occupancy
IFQ_fcount                  3868060 # cumulative IFQ full count
ifq_occupancy                2.9492 # avg IFQ occupancy (insn's)
ifq_rate                     1.9255 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5316 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6073 # fraction of time (cycle's) IFQ was full
RUU_count                  77385633 # cumulative RUU occupancy
RUU_fcount                  3266133 # cumulative RUU full count
ruu_occupancy               12.1494 # avg RUU occupancy (insn's)
ruu_rate                     1.9255 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3096 # avg RUU occupant latency (cycle's)
ruu_full                     0.5128 # fraction of time (cycle's) RUU was full
LSQ_count                  32153311 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0480 # avg LSQ occupancy (insn's)
lsq_rate                     1.9255 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6216 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123606040 # total number of slip cycles
avg_sim_slip                10.6727 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:25:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375132 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9387256 # total simulation time in cycles
sim_IPC                      1.4190 # instructions per cycle
sim_CPI                      0.7047 # cycles per instruction
sim_exec_BW                  1.4248 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36573399 # cumulative IFQ occupancy
IFQ_fcount                  8992705 # cumulative IFQ full count
ifq_occupancy                3.8961 # avg IFQ occupancy (insn's)
ifq_rate                     1.4248 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7344 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 147206538 # cumulative RUU occupancy
RUU_fcount                  8851782 # cumulative RUU full count
ruu_occupancy               15.6815 # avg RUU occupancy (insn's)
ruu_rate                     1.4248 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0060 # avg RUU occupant latency (cycle's)
ruu_full                     0.9430 # fraction of time (cycle's) RUU was full
LSQ_count                  77371891 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2422 # avg LSQ occupancy (insn's)
lsq_rate                     1.4248 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7848 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  244293735 # total number of slip cycles
avg_sim_slip                18.3391 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:25:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12580656 # total simulation time in cycles
sim_IPC                      1.6994 # instructions per cycle
sim_CPI                      0.5885 # cycles per instruction
sim_exec_BW                  1.7050 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49171590 # cumulative IFQ occupancy
IFQ_fcount                 11673725 # cumulative IFQ full count
ifq_occupancy                3.9085 # avg IFQ occupancy (insn's)
ifq_rate                     1.7050 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2924 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199823036 # cumulative RUU occupancy
RUU_fcount                 12453600 # cumulative RUU full count
ruu_occupancy               15.8834 # avg RUU occupancy (insn's)
ruu_rate                     1.7050 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3157 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64059687 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0919 # avg LSQ occupancy (insn's)
lsq_rate                     1.7050 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9864 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291665518 # total number of slip cycles
avg_sim_slip                13.6425 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:25:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861259 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37658972 # total simulation time in cycles
sim_IPC                      0.7398 # instructions per cycle
sim_CPI                      1.3517 # cycles per instruction
sim_exec_BW                  0.7398 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150559065 # cumulative IFQ occupancy
IFQ_fcount                 37639526 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7398 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4039 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 602239037 # cumulative RUU occupancy
RUU_fcount                 37638788 # cumulative RUU full count
ruu_occupancy               15.9919 # avg RUU occupancy (insn's)
ruu_rate                     0.7398 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6156 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 183056580 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8609 # avg LSQ occupancy (insn's)
lsq_rate                     0.7398 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5703 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  821804272 # total number of slip cycles
avg_sim_slip                29.4980 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:26:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6807012 # total simulation time in cycles
sim_IPC                      1.9255 # instructions per cycle
sim_CPI                      0.5193 # cycles per instruction
sim_exec_BW                  1.9317 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25344496 # cumulative IFQ occupancy
IFQ_fcount                  6210521 # cumulative IFQ full count
ifq_occupancy                3.7233 # avg IFQ occupancy (insn's)
ifq_rate                     1.9317 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9275 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104656187 # cumulative RUU occupancy
RUU_fcount                  5646682 # cumulative RUU full count
ruu_occupancy               15.3748 # avg RUU occupancy (insn's)
ruu_rate                     1.9317 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9593 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31974666 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6973 # avg LSQ occupancy (insn's)
lsq_rate                     1.9317 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4317 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153682037 # total number of slip cycles
avg_sim_slip                11.7251 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:26:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264677 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6347747 # total simulation time in cycles
sim_IPC                      1.8245 # instructions per cycle
sim_CPI                      0.5481 # cycles per instruction
sim_exec_BW                  1.9321 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18704591 # cumulative IFQ occupancy
IFQ_fcount                  3847988 # cumulative IFQ full count
ifq_occupancy                2.9467 # avg IFQ occupancy (insn's)
ifq_rate                     1.9321 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5251 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6062 # fraction of time (cycle's) IFQ was full
RUU_count                  77064313 # cumulative RUU occupancy
RUU_fcount                  3246069 # cumulative RUU full count
ruu_occupancy               12.1404 # avg RUU occupancy (insn's)
ruu_rate                     1.9321 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2834 # avg RUU occupant latency (cycle's)
ruu_full                     0.5114 # fraction of time (cycle's) RUU was full
LSQ_count                  32080855 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0539 # avg LSQ occupancy (insn's)
lsq_rate                     1.9321 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6157 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123212264 # total number of slip cycles
avg_sim_slip                10.6387 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:26:21 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375068 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9237088 # total simulation time in cycles
sim_IPC                      1.4421 # instructions per cycle
sim_CPI                      0.6934 # cycles per instruction
sim_exec_BW                  1.4480 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35995575 # cumulative IFQ occupancy
IFQ_fcount                  8848249 # cumulative IFQ full count
ifq_occupancy                3.8969 # avg IFQ occupancy (insn's)
ifq_rate                     1.4480 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6912 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144893394 # cumulative RUU occupancy
RUU_fcount                  8707342 # cumulative RUU full count
ruu_occupancy               15.6860 # avg RUU occupancy (insn's)
ruu_rate                     1.4480 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8331 # avg RUU occupant latency (cycle's)
ruu_full                     0.9427 # fraction of time (cycle's) RUU was full
LSQ_count                  75995283 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2272 # avg LSQ occupancy (insn's)
lsq_rate                     1.4480 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6819 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  240604143 # total number of slip cycles
avg_sim_slip                18.0621 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:26:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12562688 # total simulation time in cycles
sim_IPC                      1.7018 # instructions per cycle
sim_CPI                      0.5876 # cycles per instruction
sim_exec_BW                  1.7074 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49105286 # cumulative IFQ occupancy
IFQ_fcount                 11657149 # cumulative IFQ full count
ifq_occupancy                3.9088 # avg IFQ occupancy (insn's)
ifq_rate                     1.7074 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2893 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199557428 # cumulative RUU occupancy
RUU_fcount                 12437024 # cumulative RUU full count
ruu_occupancy               15.8849 # avg RUU occupancy (insn's)
ruu_rate                     1.7074 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3033 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63976391 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0926 # avg LSQ occupancy (insn's)
lsq_rate                     1.7074 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9826 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291316662 # total number of slip cycles
avg_sim_slip                13.6261 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:26:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861195 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37297140 # total simulation time in cycles
sim_IPC                      0.7470 # instructions per cycle
sim_CPI                      1.3387 # cycles per instruction
sim_exec_BW                  0.7470 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149118425 # cumulative IFQ occupancy
IFQ_fcount                 37279366 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7470 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3522 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 596476133 # cumulative RUU occupancy
RUU_fcount                 37278644 # cumulative RUU full count
ruu_occupancy               15.9925 # avg RUU occupancy (insn's)
ruu_rate                     0.7470 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4088 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181315372 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8614 # avg LSQ occupancy (insn's)
lsq_rate                     0.7470 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5078 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  814300192 # total number of slip cycles
avg_sim_slip                29.2286 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:27:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6930681 # total simulation time in cycles
sim_IPC                      1.8912 # instructions per cycle
sim_CPI                      0.5288 # cycles per instruction
sim_exec_BW                  1.8972 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25795072 # cumulative IFQ occupancy
IFQ_fcount                  6323165 # cumulative IFQ full count
ifq_occupancy                3.7219 # avg IFQ occupancy (insn's)
ifq_rate                     1.8972 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9618 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106463405 # cumulative RUU occupancy
RUU_fcount                  5759326 # cumulative RUU full count
ruu_occupancy               15.3612 # avg RUU occupancy (insn's)
ruu_rate                     1.8972 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0967 # avg RUU occupant latency (cycle's)
ruu_full                     0.8310 # fraction of time (cycle's) RUU was full
LSQ_count                  32578458 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7006 # avg LSQ occupancy (insn's)
lsq_rate                     1.8972 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4776 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156091913 # total number of slip cycles
avg_sim_slip                11.9089 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:27:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264929 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6519257 # total simulation time in cycles
sim_IPC                      1.7765 # instructions per cycle
sim_CPI                      0.5629 # cycles per instruction
sim_exec_BW                  1.8813 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19336953 # cumulative IFQ occupancy
IFQ_fcount                  4006079 # cumulative IFQ full count
ifq_occupancy                2.9661 # avg IFQ occupancy (insn's)
ifq_rate                     1.8813 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5766 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6145 # fraction of time (cycle's) IFQ was full
RUU_count                  79595231 # cumulative RUU occupancy
RUU_fcount                  3404101 # cumulative RUU full count
ruu_occupancy               12.2092 # avg RUU occupancy (insn's)
ruu_rate                     1.8813 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4897 # avg RUU occupant latency (cycle's)
ruu_full                     0.5222 # fraction of time (cycle's) RUU was full
LSQ_count                  32651680 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0085 # avg LSQ occupancy (insn's)
lsq_rate                     1.8813 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6622 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126314020 # total number of slip cycles
avg_sim_slip                10.9065 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:27:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375569 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10419743 # total simulation time in cycles
sim_IPC                      1.2784 # instructions per cycle
sim_CPI                      0.7822 # cycles per instruction
sim_exec_BW                  1.2837 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40546262 # cumulative IFQ occupancy
IFQ_fcount                  9985921 # cumulative IFQ full count
ifq_occupancy                3.8913 # avg IFQ occupancy (insn's)
ifq_rate                     1.2837 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0314 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 163110858 # cumulative RUU occupancy
RUU_fcount                  9844891 # cumulative RUU full count
ruu_occupancy               15.6540 # avg RUU occupancy (insn's)
ruu_rate                     1.2837 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.1947 # avg RUU occupant latency (cycle's)
ruu_full                     0.9448 # fraction of time (cycle's) RUU was full
LSQ_count                  86836594 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3339 # avg LSQ occupancy (insn's)
lsq_rate                     1.2837 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4922 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  269661669 # total number of slip cycles
avg_sim_slip                20.2435 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:27:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12704186 # total simulation time in cycles
sim_IPC                      1.6829 # instructions per cycle
sim_CPI                      0.5942 # cycles per instruction
sim_exec_BW                  1.6884 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49627430 # cumulative IFQ occupancy
IFQ_fcount                 11787685 # cumulative IFQ full count
ifq_occupancy                3.9064 # avg IFQ occupancy (insn's)
ifq_rate                     1.6884 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3136 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201649091 # cumulative RUU occupancy
RUU_fcount                 12567560 # cumulative RUU full count
ruu_occupancy               15.8726 # avg RUU occupancy (insn's)
ruu_rate                     1.6884 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4008 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64632347 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0875 # avg LSQ occupancy (insn's)
lsq_rate                     1.6884 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0131 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294063903 # total number of slip cycles
avg_sim_slip                13.7546 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:27:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861699 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40146567 # total simulation time in cycles
sim_IPC                      0.6939 # instructions per cycle
sim_CPI                      1.4410 # cycles per instruction
sim_exec_BW                  0.6940 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 160463465 # cumulative IFQ occupancy
IFQ_fcount                 40115626 # cumulative IFQ full count
ifq_occupancy                3.9969 # avg IFQ occupancy (insn's)
ifq_rate                     0.6940 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7593 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 641859002 # cumulative RUU occupancy
RUU_fcount                 40114778 # cumulative RUU full count
ruu_occupancy               15.9879 # avg RUU occupancy (insn's)
ruu_rate                     0.6940 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.0373 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 195027385 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8579 # avg LSQ occupancy (insn's)
lsq_rate                     0.6940 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.9998 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  873394822 # total number of slip cycles
avg_sim_slip                31.3498 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:28:14 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6795234 # total simulation time in cycles
sim_IPC                      1.9289 # instructions per cycle
sim_CPI                      0.5184 # cycles per instruction
sim_exec_BW                  1.9350 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25301584 # cumulative IFQ occupancy
IFQ_fcount                  6199793 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9350 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9242 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104484071 # cumulative RUU occupancy
RUU_fcount                  5635954 # cumulative RUU full count
ruu_occupancy               15.3761 # avg RUU occupancy (insn's)
ruu_rate                     1.9350 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9462 # avg RUU occupant latency (cycle's)
ruu_full                     0.8294 # fraction of time (cycle's) RUU was full
LSQ_count                  31917162 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6970 # avg LSQ occupancy (insn's)
lsq_rate                     1.9350 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4273 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153452525 # total number of slip cycles
avg_sim_slip                11.7076 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:28:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264653 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6331415 # total simulation time in cycles
sim_IPC                      1.8292 # instructions per cycle
sim_CPI                      0.5467 # cycles per instruction
sim_exec_BW                  1.9371 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18644375 # cumulative IFQ occupancy
IFQ_fcount                  3832934 # cumulative IFQ full count
ifq_occupancy                2.9447 # avg IFQ occupancy (insn's)
ifq_rate                     1.9371 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5202 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6054 # fraction of time (cycle's) IFQ was full
RUU_count                  76823323 # cumulative RUU occupancy
RUU_fcount                  3231021 # cumulative RUU full count
ruu_occupancy               12.1337 # avg RUU occupancy (insn's)
ruu_rate                     1.9371 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2638 # avg RUU occupant latency (cycle's)
ruu_full                     0.5103 # fraction of time (cycle's) RUU was full
LSQ_count                  32026513 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0583 # avg LSQ occupancy (insn's)
lsq_rate                     1.9371 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6113 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122916932 # total number of slip cycles
avg_sim_slip                10.6132 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:28:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375020 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9124462 # total simulation time in cycles
sim_IPC                      1.4599 # instructions per cycle
sim_CPI                      0.6850 # cycles per instruction
sim_exec_BW                  1.4658 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35562210 # cumulative IFQ occupancy
IFQ_fcount                  8739908 # cumulative IFQ full count
ifq_occupancy                3.8975 # avg IFQ occupancy (insn's)
ifq_rate                     1.4658 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6589 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 143158534 # cumulative RUU occupancy
RUU_fcount                  8599012 # cumulative RUU full count
ruu_occupancy               15.6895 # avg RUU occupancy (insn's)
ruu_rate                     1.4658 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7034 # avg RUU occupant latency (cycle's)
ruu_full                     0.9424 # fraction of time (cycle's) RUU was full
LSQ_count                  74962824 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2156 # avg LSQ occupancy (insn's)
lsq_rate                     1.4658 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6047 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  237836944 # total number of slip cycles
avg_sim_slip                17.8544 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:28:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12549212 # total simulation time in cycles
sim_IPC                      1.7036 # instructions per cycle
sim_CPI                      0.5870 # cycles per instruction
sim_exec_BW                  1.7093 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49055558 # cumulative IFQ occupancy
IFQ_fcount                 11644717 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7093 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2870 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199358222 # cumulative RUU occupancy
RUU_fcount                 12424592 # cumulative RUU full count
ruu_occupancy               15.8861 # avg RUU occupancy (insn's)
ruu_rate                     1.7093 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2940 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63913919 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7093 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9797 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291055020 # total number of slip cycles
avg_sim_slip                13.6139 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:28:55 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861147 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37025766 # total simulation time in cycles
sim_IPC                      0.7524 # instructions per cycle
sim_CPI                      1.3290 # cycles per instruction
sim_exec_BW                  0.7525 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148037945 # cumulative IFQ occupancy
IFQ_fcount                 37009246 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3134 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 592153955 # cumulative RUU occupancy
RUU_fcount                 37008536 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2538 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180009466 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8617 # avg LSQ occupancy (insn's)
lsq_rate                     0.7525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4609 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  808672132 # total number of slip cycles
avg_sim_slip                29.0266 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:29:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6822716 # total simulation time in cycles
sim_IPC                      1.9211 # instructions per cycle
sim_CPI                      0.5205 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25401712 # cumulative IFQ occupancy
IFQ_fcount                  6224825 # cumulative IFQ full count
ifq_occupancy                3.7231 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9318 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104885675 # cumulative RUU occupancy
RUU_fcount                  5660986 # cumulative RUU full count
ruu_occupancy               15.3730 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9767 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32051338 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6977 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4376 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153988053 # total number of slip cycles
avg_sim_slip                11.7484 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:29:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264709 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6369523 # total simulation time in cycles
sim_IPC                      1.8183 # instructions per cycle
sim_CPI                      0.5500 # cycles per instruction
sim_exec_BW                  1.9255 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18784879 # cumulative IFQ occupancy
IFQ_fcount                  3868060 # cumulative IFQ full count
ifq_occupancy                2.9492 # avg IFQ occupancy (insn's)
ifq_rate                     1.9255 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5316 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6073 # fraction of time (cycle's) IFQ was full
RUU_count                  77385633 # cumulative RUU occupancy
RUU_fcount                  3266133 # cumulative RUU full count
ruu_occupancy               12.1494 # avg RUU occupancy (insn's)
ruu_rate                     1.9255 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3096 # avg RUU occupant latency (cycle's)
ruu_full                     0.5128 # fraction of time (cycle's) RUU was full
LSQ_count                  32153311 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0480 # avg LSQ occupancy (insn's)
lsq_rate                     1.9255 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6216 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123606040 # total number of slip cycles
avg_sim_slip                10.6727 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:29:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375132 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9387256 # total simulation time in cycles
sim_IPC                      1.4190 # instructions per cycle
sim_CPI                      0.7047 # cycles per instruction
sim_exec_BW                  1.4248 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36573399 # cumulative IFQ occupancy
IFQ_fcount                  8992705 # cumulative IFQ full count
ifq_occupancy                3.8961 # avg IFQ occupancy (insn's)
ifq_rate                     1.4248 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7344 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 147206538 # cumulative RUU occupancy
RUU_fcount                  8851782 # cumulative RUU full count
ruu_occupancy               15.6815 # avg RUU occupancy (insn's)
ruu_rate                     1.4248 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0060 # avg RUU occupant latency (cycle's)
ruu_full                     0.9430 # fraction of time (cycle's) RUU was full
LSQ_count                  77371891 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2422 # avg LSQ occupancy (insn's)
lsq_rate                     1.4248 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7848 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  244293735 # total number of slip cycles
avg_sim_slip                18.3391 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:29:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12580656 # total simulation time in cycles
sim_IPC                      1.6994 # instructions per cycle
sim_CPI                      0.5885 # cycles per instruction
sim_exec_BW                  1.7050 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49171590 # cumulative IFQ occupancy
IFQ_fcount                 11673725 # cumulative IFQ full count
ifq_occupancy                3.9085 # avg IFQ occupancy (insn's)
ifq_rate                     1.7050 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2924 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199823036 # cumulative RUU occupancy
RUU_fcount                 12453600 # cumulative RUU full count
ruu_occupancy               15.8834 # avg RUU occupancy (insn's)
ruu_rate                     1.7050 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3157 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64059687 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0919 # avg LSQ occupancy (insn's)
lsq_rate                     1.7050 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9864 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291665518 # total number of slip cycles
avg_sim_slip                13.6425 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:29:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861259 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37658972 # total simulation time in cycles
sim_IPC                      0.7398 # instructions per cycle
sim_CPI                      1.3517 # cycles per instruction
sim_exec_BW                  0.7398 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150559065 # cumulative IFQ occupancy
IFQ_fcount                 37639526 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7398 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4039 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 602239037 # cumulative RUU occupancy
RUU_fcount                 37638788 # cumulative RUU full count
ruu_occupancy               15.9919 # avg RUU occupancy (insn's)
ruu_rate                     0.7398 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6156 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 183056580 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8609 # avg LSQ occupancy (insn's)
lsq_rate                     0.7398 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5703 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  821804272 # total number of slip cycles
avg_sim_slip                29.4980 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:30:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6807012 # total simulation time in cycles
sim_IPC                      1.9255 # instructions per cycle
sim_CPI                      0.5193 # cycles per instruction
sim_exec_BW                  1.9317 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25344496 # cumulative IFQ occupancy
IFQ_fcount                  6210521 # cumulative IFQ full count
ifq_occupancy                3.7233 # avg IFQ occupancy (insn's)
ifq_rate                     1.9317 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9275 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104656187 # cumulative RUU occupancy
RUU_fcount                  5646682 # cumulative RUU full count
ruu_occupancy               15.3748 # avg RUU occupancy (insn's)
ruu_rate                     1.9317 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9593 # avg RUU occupant latency (cycle's)
ruu_full                     0.8295 # fraction of time (cycle's) RUU was full
LSQ_count                  31974666 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6973 # avg LSQ occupancy (insn's)
lsq_rate                     1.9317 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4317 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153682037 # total number of slip cycles
avg_sim_slip                11.7251 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:30:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264677 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6347747 # total simulation time in cycles
sim_IPC                      1.8245 # instructions per cycle
sim_CPI                      0.5481 # cycles per instruction
sim_exec_BW                  1.9321 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18704591 # cumulative IFQ occupancy
IFQ_fcount                  3847988 # cumulative IFQ full count
ifq_occupancy                2.9467 # avg IFQ occupancy (insn's)
ifq_rate                     1.9321 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5251 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6062 # fraction of time (cycle's) IFQ was full
RUU_count                  77064313 # cumulative RUU occupancy
RUU_fcount                  3246069 # cumulative RUU full count
ruu_occupancy               12.1404 # avg RUU occupancy (insn's)
ruu_rate                     1.9321 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2834 # avg RUU occupant latency (cycle's)
ruu_full                     0.5114 # fraction of time (cycle's) RUU was full
LSQ_count                  32080855 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0539 # avg LSQ occupancy (insn's)
lsq_rate                     1.9321 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6157 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123212264 # total number of slip cycles
avg_sim_slip                10.6387 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:30:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375068 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9237088 # total simulation time in cycles
sim_IPC                      1.4421 # instructions per cycle
sim_CPI                      0.6934 # cycles per instruction
sim_exec_BW                  1.4480 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35995575 # cumulative IFQ occupancy
IFQ_fcount                  8848249 # cumulative IFQ full count
ifq_occupancy                3.8969 # avg IFQ occupancy (insn's)
ifq_rate                     1.4480 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6912 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 144893394 # cumulative RUU occupancy
RUU_fcount                  8707342 # cumulative RUU full count
ruu_occupancy               15.6860 # avg RUU occupancy (insn's)
ruu_rate                     1.4480 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8331 # avg RUU occupant latency (cycle's)
ruu_full                     0.9427 # fraction of time (cycle's) RUU was full
LSQ_count                  75995283 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2272 # avg LSQ occupancy (insn's)
lsq_rate                     1.4480 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6819 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  240604143 # total number of slip cycles
avg_sim_slip                18.0621 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:30:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12562688 # total simulation time in cycles
sim_IPC                      1.7018 # instructions per cycle
sim_CPI                      0.5876 # cycles per instruction
sim_exec_BW                  1.7074 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49105286 # cumulative IFQ occupancy
IFQ_fcount                 11657149 # cumulative IFQ full count
ifq_occupancy                3.9088 # avg IFQ occupancy (insn's)
ifq_rate                     1.7074 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2893 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199557428 # cumulative RUU occupancy
RUU_fcount                 12437024 # cumulative RUU full count
ruu_occupancy               15.8849 # avg RUU occupancy (insn's)
ruu_rate                     1.7074 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3033 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63976391 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0926 # avg LSQ occupancy (insn's)
lsq_rate                     1.7074 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9826 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291316662 # total number of slip cycles
avg_sim_slip                13.6261 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:31:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861195 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37297140 # total simulation time in cycles
sim_IPC                      0.7470 # instructions per cycle
sim_CPI                      1.3387 # cycles per instruction
sim_exec_BW                  0.7470 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149118425 # cumulative IFQ occupancy
IFQ_fcount                 37279366 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7470 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3522 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 596476133 # cumulative RUU occupancy
RUU_fcount                 37278644 # cumulative RUU full count
ruu_occupancy               15.9925 # avg RUU occupancy (insn's)
ruu_rate                     0.7470 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4088 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181315372 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8614 # avg LSQ occupancy (insn's)
lsq_rate                     0.7470 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5078 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  814300192 # total number of slip cycles
avg_sim_slip                29.2286 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:31:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6930681 # total simulation time in cycles
sim_IPC                      1.8912 # instructions per cycle
sim_CPI                      0.5288 # cycles per instruction
sim_exec_BW                  1.8972 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25795072 # cumulative IFQ occupancy
IFQ_fcount                  6323165 # cumulative IFQ full count
ifq_occupancy                3.7219 # avg IFQ occupancy (insn's)
ifq_rate                     1.8972 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9618 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106463405 # cumulative RUU occupancy
RUU_fcount                  5759326 # cumulative RUU full count
ruu_occupancy               15.3612 # avg RUU occupancy (insn's)
ruu_rate                     1.8972 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0967 # avg RUU occupant latency (cycle's)
ruu_full                     0.8310 # fraction of time (cycle's) RUU was full
LSQ_count                  32578458 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7006 # avg LSQ occupancy (insn's)
lsq_rate                     1.8972 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4776 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  156091913 # total number of slip cycles
avg_sim_slip                11.9089 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:31:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264929 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6519257 # total simulation time in cycles
sim_IPC                      1.7765 # instructions per cycle
sim_CPI                      0.5629 # cycles per instruction
sim_exec_BW                  1.8813 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19336953 # cumulative IFQ occupancy
IFQ_fcount                  4006079 # cumulative IFQ full count
ifq_occupancy                2.9661 # avg IFQ occupancy (insn's)
ifq_rate                     1.8813 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5766 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6145 # fraction of time (cycle's) IFQ was full
RUU_count                  79595231 # cumulative RUU occupancy
RUU_fcount                  3404101 # cumulative RUU full count
ruu_occupancy               12.2092 # avg RUU occupancy (insn's)
ruu_rate                     1.8813 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4897 # avg RUU occupant latency (cycle's)
ruu_full                     0.5222 # fraction of time (cycle's) RUU was full
LSQ_count                  32651680 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0085 # avg LSQ occupancy (insn's)
lsq_rate                     1.8813 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6622 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126314020 # total number of slip cycles
avg_sim_slip                10.9065 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:31:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375569 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10419743 # total simulation time in cycles
sim_IPC                      1.2784 # instructions per cycle
sim_CPI                      0.7822 # cycles per instruction
sim_exec_BW                  1.2837 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40546262 # cumulative IFQ occupancy
IFQ_fcount                  9985921 # cumulative IFQ full count
ifq_occupancy                3.8913 # avg IFQ occupancy (insn's)
ifq_rate                     1.2837 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0314 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9584 # fraction of time (cycle's) IFQ was full
RUU_count                 163110858 # cumulative RUU occupancy
RUU_fcount                  9844891 # cumulative RUU full count
ruu_occupancy               15.6540 # avg RUU occupancy (insn's)
ruu_rate                     1.2837 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.1947 # avg RUU occupant latency (cycle's)
ruu_full                     0.9448 # fraction of time (cycle's) RUU was full
LSQ_count                  86836594 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3339 # avg LSQ occupancy (insn's)
lsq_rate                     1.2837 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4922 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  269661669 # total number of slip cycles
avg_sim_slip                20.2435 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:31:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12704186 # total simulation time in cycles
sim_IPC                      1.6829 # instructions per cycle
sim_CPI                      0.5942 # cycles per instruction
sim_exec_BW                  1.6884 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49627430 # cumulative IFQ occupancy
IFQ_fcount                 11787685 # cumulative IFQ full count
ifq_occupancy                3.9064 # avg IFQ occupancy (insn's)
ifq_rate                     1.6884 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3136 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201649091 # cumulative RUU occupancy
RUU_fcount                 12567560 # cumulative RUU full count
ruu_occupancy               15.8726 # avg RUU occupancy (insn's)
ruu_rate                     1.6884 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.4008 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64632347 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0875 # avg LSQ occupancy (insn's)
lsq_rate                     1.6884 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0131 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  294063903 # total number of slip cycles
avg_sim_slip                13.7546 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:32:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861699 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  40146567 # total simulation time in cycles
sim_IPC                      0.6939 # instructions per cycle
sim_CPI                      1.4410 # cycles per instruction
sim_exec_BW                  0.6940 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 160463465 # cumulative IFQ occupancy
IFQ_fcount                 40115626 # cumulative IFQ full count
ifq_occupancy                3.9969 # avg IFQ occupancy (insn's)
ifq_rate                     0.6940 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.7593 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 641859002 # cumulative RUU occupancy
RUU_fcount                 40114778 # cumulative RUU full count
ruu_occupancy               15.9879 # avg RUU occupancy (insn's)
ruu_rate                     0.6940 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.0373 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 195027385 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8579 # avg LSQ occupancy (insn's)
lsq_rate                     0.6940 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.9998 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  873394822 # total number of slip cycles
avg_sim_slip                31.3498 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:32:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6795234 # total simulation time in cycles
sim_IPC                      1.9289 # instructions per cycle
sim_CPI                      0.5184 # cycles per instruction
sim_exec_BW                  1.9350 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25301584 # cumulative IFQ occupancy
IFQ_fcount                  6199793 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9350 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9242 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104484071 # cumulative RUU occupancy
RUU_fcount                  5635954 # cumulative RUU full count
ruu_occupancy               15.3761 # avg RUU occupancy (insn's)
ruu_rate                     1.9350 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9462 # avg RUU occupant latency (cycle's)
ruu_full                     0.8294 # fraction of time (cycle's) RUU was full
LSQ_count                  31917162 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6970 # avg LSQ occupancy (insn's)
lsq_rate                     1.9350 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4273 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153452525 # total number of slip cycles
avg_sim_slip                11.7076 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:32:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264653 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6331415 # total simulation time in cycles
sim_IPC                      1.8292 # instructions per cycle
sim_CPI                      0.5467 # cycles per instruction
sim_exec_BW                  1.9371 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18644375 # cumulative IFQ occupancy
IFQ_fcount                  3832934 # cumulative IFQ full count
ifq_occupancy                2.9447 # avg IFQ occupancy (insn's)
ifq_rate                     1.9371 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5202 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6054 # fraction of time (cycle's) IFQ was full
RUU_count                  76823323 # cumulative RUU occupancy
RUU_fcount                  3231021 # cumulative RUU full count
ruu_occupancy               12.1337 # avg RUU occupancy (insn's)
ruu_rate                     1.9371 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2638 # avg RUU occupant latency (cycle's)
ruu_full                     0.5103 # fraction of time (cycle's) RUU was full
LSQ_count                  32026513 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0583 # avg LSQ occupancy (insn's)
lsq_rate                     1.9371 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6113 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122916932 # total number of slip cycles
avg_sim_slip                10.6132 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:32:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375020 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9124462 # total simulation time in cycles
sim_IPC                      1.4599 # instructions per cycle
sim_CPI                      0.6850 # cycles per instruction
sim_exec_BW                  1.4658 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35562210 # cumulative IFQ occupancy
IFQ_fcount                  8739908 # cumulative IFQ full count
ifq_occupancy                3.8975 # avg IFQ occupancy (insn's)
ifq_rate                     1.4658 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6589 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 143158534 # cumulative RUU occupancy
RUU_fcount                  8599012 # cumulative RUU full count
ruu_occupancy               15.6895 # avg RUU occupancy (insn's)
ruu_rate                     1.4658 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7034 # avg RUU occupant latency (cycle's)
ruu_full                     0.9424 # fraction of time (cycle's) RUU was full
LSQ_count                  74962824 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2156 # avg LSQ occupancy (insn's)
lsq_rate                     1.4658 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6047 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  237836944 # total number of slip cycles
avg_sim_slip                17.8544 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:32:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12549212 # total simulation time in cycles
sim_IPC                      1.7036 # instructions per cycle
sim_CPI                      0.5870 # cycles per instruction
sim_exec_BW                  1.7093 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49055558 # cumulative IFQ occupancy
IFQ_fcount                 11644717 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7093 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2870 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199358222 # cumulative RUU occupancy
RUU_fcount                 12424592 # cumulative RUU full count
ruu_occupancy               15.8861 # avg RUU occupancy (insn's)
ruu_rate                     1.7093 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2940 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63913919 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7093 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9797 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291055020 # total number of slip cycles
avg_sim_slip                13.6139 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:33:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861147 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37025766 # total simulation time in cycles
sim_IPC                      0.7524 # instructions per cycle
sim_CPI                      1.3290 # cycles per instruction
sim_exec_BW                  0.7525 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148037945 # cumulative IFQ occupancy
IFQ_fcount                 37009246 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3134 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 592153955 # cumulative RUU occupancy
RUU_fcount                 37008536 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2538 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180009466 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8617 # avg LSQ occupancy (insn's)
lsq_rate                     0.7525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4609 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  808672132 # total number of slip cycles
avg_sim_slip                29.0266 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:33:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6808975 # total simulation time in cycles
sim_IPC                      1.9250 # instructions per cycle
sim_CPI                      0.5195 # cycles per instruction
sim_exec_BW                  1.9311 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25351648 # cumulative IFQ occupancy
IFQ_fcount                  6212309 # cumulative IFQ full count
ifq_occupancy                3.7233 # avg IFQ occupancy (insn's)
ifq_rate                     1.9311 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9280 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104684873 # cumulative RUU occupancy
RUU_fcount                  5648470 # cumulative RUU full count
ruu_occupancy               15.3745 # avg RUU occupancy (insn's)
ruu_rate                     1.9311 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9614 # avg RUU occupant latency (cycle's)
ruu_full                     0.8296 # fraction of time (cycle's) RUU was full
LSQ_count                  31984250 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6974 # avg LSQ occupancy (insn's)
lsq_rate                     1.9311 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4324 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153720289 # total number of slip cycles
avg_sim_slip                11.7280 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:33:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264681 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6350469 # total simulation time in cycles
sim_IPC                      1.8237 # instructions per cycle
sim_CPI                      0.5483 # cycles per instruction
sim_exec_BW                  1.9313 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18714627 # cumulative IFQ occupancy
IFQ_fcount                  3850497 # cumulative IFQ full count
ifq_occupancy                2.9470 # avg IFQ occupancy (insn's)
ifq_rate                     1.9313 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5259 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6063 # fraction of time (cycle's) IFQ was full
RUU_count                  77104478 # cumulative RUU occupancy
RUU_fcount                  3248577 # cumulative RUU full count
ruu_occupancy               12.1415 # avg RUU occupancy (insn's)
ruu_rate                     1.9313 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2867 # avg RUU occupant latency (cycle's)
ruu_full                     0.5115 # fraction of time (cycle's) RUU was full
LSQ_count                  32089912 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0532 # avg LSQ occupancy (insn's)
lsq_rate                     1.9313 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6164 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123261486 # total number of slip cycles
avg_sim_slip                10.6430 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:33:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375076 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9255859 # total simulation time in cycles
sim_IPC                      1.4392 # instructions per cycle
sim_CPI                      0.6948 # cycles per instruction
sim_exec_BW                  1.4450 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36067803 # cumulative IFQ occupancy
IFQ_fcount                  8866306 # cumulative IFQ full count
ifq_occupancy                3.8968 # avg IFQ occupancy (insn's)
ifq_rate                     1.4450 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 145182537 # cumulative RUU occupancy
RUU_fcount                  8725397 # cumulative RUU full count
ruu_occupancy               15.6855 # avg RUU occupancy (insn's)
ruu_rate                     1.4450 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8547 # avg RUU occupant latency (cycle's)
ruu_full                     0.9427 # fraction of time (cycle's) RUU was full
LSQ_count                  76167359 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2291 # avg LSQ occupancy (insn's)
lsq_rate                     1.4450 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6947 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  241065342 # total number of slip cycles
avg_sim_slip                18.0968 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:34:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12564934 # total simulation time in cycles
sim_IPC                      1.7015 # instructions per cycle
sim_CPI                      0.5877 # cycles per instruction
sim_exec_BW                  1.7071 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49113574 # cumulative IFQ occupancy
IFQ_fcount                 11659221 # cumulative IFQ full count
ifq_occupancy                3.9088 # avg IFQ occupancy (insn's)
ifq_rate                     1.7071 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2897 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199590629 # cumulative RUU occupancy
RUU_fcount                 12439096 # cumulative RUU full count
ruu_occupancy               15.8847 # avg RUU occupancy (insn's)
ruu_rate                     1.7071 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3049 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63986803 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0925 # avg LSQ occupancy (insn's)
lsq_rate                     1.7071 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9831 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291360269 # total number of slip cycles
avg_sim_slip                13.6282 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 8 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:34:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861203 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37342369 # total simulation time in cycles
sim_IPC                      0.7461 # instructions per cycle
sim_CPI                      1.3404 # cycles per instruction
sim_exec_BW                  0.7461 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149298505 # cumulative IFQ occupancy
IFQ_fcount                 37324386 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7461 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3587 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 597196496 # cumulative RUU occupancy
RUU_fcount                 37323662 # cumulative RUU full count
ruu_occupancy               15.9925 # avg RUU occupancy (insn's)
ruu_rate                     0.7461 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4347 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181533023 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8613 # avg LSQ occupancy (insn's)
lsq_rate                     0.7461 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5156 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  815238202 # total number of slip cycles
avg_sim_slip                29.2623 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:34:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6793271 # total simulation time in cycles
sim_IPC                      1.9294 # instructions per cycle
sim_CPI                      0.5183 # cycles per instruction
sim_exec_BW                  1.9356 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25294432 # cumulative IFQ occupancy
IFQ_fcount                  6198005 # cumulative IFQ full count
ifq_occupancy                3.7235 # avg IFQ occupancy (insn's)
ifq_rate                     1.9356 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9237 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104455385 # cumulative RUU occupancy
RUU_fcount                  5634166 # cumulative RUU full count
ruu_occupancy               15.3763 # avg RUU occupancy (insn's)
ruu_rate                     1.9356 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9440 # avg RUU occupant latency (cycle's)
ruu_full                     0.8294 # fraction of time (cycle's) RUU was full
LSQ_count                  31907578 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6969 # avg LSQ occupancy (insn's)
lsq_rate                     1.9356 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4266 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153414273 # total number of slip cycles
avg_sim_slip                11.7046 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:34:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264649 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6328693 # total simulation time in cycles
sim_IPC                      1.8300 # instructions per cycle
sim_CPI                      0.5464 # cycles per instruction
sim_exec_BW                  1.9379 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18634339 # cumulative IFQ occupancy
IFQ_fcount                  3830425 # cumulative IFQ full count
ifq_occupancy                2.9444 # avg IFQ occupancy (insn's)
ifq_rate                     1.9379 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5194 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6052 # fraction of time (cycle's) IFQ was full
RUU_count                  76783158 # cumulative RUU occupancy
RUU_fcount                  3228513 # cumulative RUU full count
ruu_occupancy               12.1325 # avg RUU occupancy (insn's)
ruu_rate                     1.9379 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2605 # avg RUU occupant latency (cycle's)
ruu_full                     0.5101 # fraction of time (cycle's) RUU was full
LSQ_count                  32017456 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0591 # avg LSQ occupancy (insn's)
lsq_rate                     1.9379 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6105 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122867710 # total number of slip cycles
avg_sim_slip                10.6090 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:34:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375012 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9105691 # total simulation time in cycles
sim_IPC                      1.4629 # instructions per cycle
sim_CPI                      0.6836 # cycles per instruction
sim_exec_BW                  1.4689 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35489982 # cumulative IFQ occupancy
IFQ_fcount                  8721851 # cumulative IFQ full count
ifq_occupancy                3.8976 # avg IFQ occupancy (insn's)
ifq_rate                     1.4689 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6535 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9578 # fraction of time (cycle's) IFQ was full
RUU_count                 142869391 # cumulative RUU occupancy
RUU_fcount                  8580957 # cumulative RUU full count
ruu_occupancy               15.6901 # avg RUU occupancy (insn's)
ruu_rate                     1.4689 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.6818 # avg RUU occupant latency (cycle's)
ruu_full                     0.9424 # fraction of time (cycle's) RUU was full
LSQ_count                  74790748 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2136 # avg LSQ occupancy (insn's)
lsq_rate                     1.4689 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.5918 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  237375745 # total number of slip cycles
avg_sim_slip                17.8198 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:35:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 15 # total simulation time in seconds
sim_inst_rate          1425283.7333 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12546966 # total simulation time in cycles
sim_IPC                      1.7039 # instructions per cycle
sim_CPI                      0.5869 # cycles per instruction
sim_exec_BW                  1.7096 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49047270 # cumulative IFQ occupancy
IFQ_fcount                 11642645 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7096 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2866 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199325021 # cumulative RUU occupancy
RUU_fcount                 12422520 # cumulative RUU full count
ruu_occupancy               15.8863 # avg RUU occupancy (insn's)
ruu_rate                     1.7096 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2925 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63903507 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7096 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9792 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291011413 # total number of slip cycles
avg_sim_slip                13.6119 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:35:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861139 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  36980537 # total simulation time in cycles
sim_IPC                      0.7534 # instructions per cycle
sim_CPI                      1.3274 # cycles per instruction
sim_exec_BW                  0.7534 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 147857865 # cumulative IFQ occupancy
IFQ_fcount                 36964226 # cumulative IFQ full count
ifq_occupancy                3.9983 # avg IFQ occupancy (insn's)
ifq_rate                     0.7534 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3070 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 591433592 # cumulative RUU occupancy
RUU_fcount                 36963518 # cumulative RUU full count
ruu_occupancy               15.9931 # avg RUU occupancy (insn's)
ruu_rate                     0.7534 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2279 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 179791815 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8618 # avg LSQ occupancy (insn's)
lsq_rate                     0.7534 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4531 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  807734122 # total number of slip cycles
avg_sim_slip                28.9929 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:35:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6903199 # total simulation time in cycles
sim_IPC                      1.8987 # instructions per cycle
sim_CPI                      0.5267 # cycles per instruction
sim_exec_BW                  1.9048 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25694944 # cumulative IFQ occupancy
IFQ_fcount                  6298133 # cumulative IFQ full count
ifq_occupancy                3.7222 # avg IFQ occupancy (insn's)
ifq_rate                     1.9048 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9541 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9123 # fraction of time (cycle's) IFQ was full
RUU_count                 106061801 # cumulative RUU occupancy
RUU_fcount                  5734294 # cumulative RUU full count
ruu_occupancy               15.3642 # avg RUU occupancy (insn's)
ruu_rate                     1.9048 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0662 # avg RUU occupant latency (cycle's)
ruu_full                     0.8307 # fraction of time (cycle's) RUU was full
LSQ_count                  32444282 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6999 # avg LSQ occupancy (insn's)
lsq_rate                     1.9048 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4674 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  155556385 # total number of slip cycles
avg_sim_slip                11.8681 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:35:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264873 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6481135 # total simulation time in cycles
sim_IPC                      1.7870 # instructions per cycle
sim_CPI                      0.5596 # cycles per instruction
sim_exec_BW                  1.8924 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19196393 # cumulative IFQ occupancy
IFQ_fcount                  3970939 # cumulative IFQ full count
ifq_occupancy                2.9619 # avg IFQ occupancy (insn's)
ifq_rate                     1.8924 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5652 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6127 # fraction of time (cycle's) IFQ was full
RUU_count                  79032697 # cumulative RUU occupancy
RUU_fcount                  3368975 # cumulative RUU full count
ruu_occupancy               12.1943 # avg RUU occupancy (insn's)
ruu_rate                     1.8924 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4438 # avg RUU occupant latency (cycle's)
ruu_full                     0.5198 # fraction of time (cycle's) RUU was full
LSQ_count                  32524783 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0184 # avg LSQ occupancy (insn's)
lsq_rate                     1.8924 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6519 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125624589 # total number of slip cycles
avg_sim_slip                10.8470 # the average slip between issue and retirement
bpred_bimod.lookups         3257657 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820212 # total number of accesses
il1.hits                   12819995 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820212 # total number of accesses
itlb.hits                  12820205 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:36:02 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375457 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                  10156921 # total simulation time in cycles
sim_IPC                      1.3115 # instructions per cycle
sim_CPI                      0.7625 # cycles per instruction
sim_exec_BW                  1.3169 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  39534961 # cumulative IFQ occupancy
IFQ_fcount                  9733096 # cumulative IFQ full count
ifq_occupancy                3.8924 # avg IFQ occupancy (insn's)
ifq_rate                     1.3169 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.9558 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9583 # fraction of time (cycle's) IFQ was full
RUU_count                 159062406 # cumulative RUU occupancy
RUU_fcount                  9592093 # cumulative RUU full count
ruu_occupancy               15.6605 # avg RUU occupancy (insn's)
ruu_rate                     1.3169 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.8921 # avg RUU occupant latency (cycle's)
ruu_full                     0.9444 # fraction of time (cycle's) RUU was full
LSQ_count                  84427359 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3123 # avg LSQ occupancy (insn's)
lsq_rate                     1.3169 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3121 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  263204262 # total number of slip cycles
avg_sim_slip                19.7587 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766194 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:36:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12672742 # total simulation time in cycles
sim_IPC                      1.6870 # instructions per cycle
sim_CPI                      0.5928 # cycles per instruction
sim_exec_BW                  1.6926 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49511398 # cumulative IFQ occupancy
IFQ_fcount                 11758677 # cumulative IFQ full count
ifq_occupancy                3.9069 # avg IFQ occupancy (insn's)
ifq_rate                     1.6926 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3082 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 201184277 # cumulative RUU occupancy
RUU_fcount                 12538552 # cumulative RUU full count
ruu_occupancy               15.8754 # avg RUU occupancy (insn's)
ruu_rate                     1.6926 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3792 # avg RUU occupant latency (cycle's)
ruu_full                     0.9894 # fraction of time (cycle's) RUU was full
LSQ_count                  64486579 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0886 # avg LSQ occupancy (insn's)
lsq_rate                     1.6926 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0064 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293453405 # total number of slip cycles
avg_sim_slip                13.7261 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 32 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:36:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861587 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  39513361 # total simulation time in cycles
sim_IPC                      0.7051 # instructions per cycle
sim_CPI                      1.4183 # cycles per instruction
sim_exec_BW                  0.7051 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 157942345 # cumulative IFQ occupancy
IFQ_fcount                 39485346 # cumulative IFQ full count
ifq_occupancy                3.9972 # avg IFQ occupancy (insn's)
ifq_rate                     0.7051 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.6688 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 631773920 # cumulative RUU occupancy
RUU_fcount                 39484526 # cumulative RUU full count
ruu_occupancy               15.9889 # avg RUU occupancy (insn's)
ruu_rate                     0.7051 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.6754 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 191980271 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8586 # avg LSQ occupancy (insn's)
lsq_rate                     0.7051 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8905 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  860262682 # total number of slip cycles
avg_sim_slip                30.8784 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput matrix 

sim: simulation started @ Thu Dec 15 11:36:51 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6781493 # total simulation time in cycles
sim_IPC                      1.9328 # instructions per cycle
sim_CPI                      0.5174 # cycles per instruction
sim_exec_BW                  1.9390 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25251520 # cumulative IFQ occupancy
IFQ_fcount                  6187277 # cumulative IFQ full count
ifq_occupancy                3.7236 # avg IFQ occupancy (insn's)
ifq_rate                     1.9390 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9204 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104283269 # cumulative RUU occupancy
RUU_fcount                  5623438 # cumulative RUU full count
ruu_occupancy               15.3776 # avg RUU occupancy (insn's)
ruu_rate                     1.9390 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9309 # avg RUU occupant latency (cycle's)
ruu_full                     0.8292 # fraction of time (cycle's) RUU was full
LSQ_count                  31850074 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6966 # avg LSQ occupancy (insn's)
lsq_rate                     1.9390 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4222 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153184761 # total number of slip cycles
avg_sim_slip                11.6871 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357593 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput sort 

sim: simulation started @ Thu Dec 15 11:37:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264625 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6312361 # total simulation time in cycles
sim_IPC                      1.8347 # instructions per cycle
sim_CPI                      0.5450 # cycles per instruction
sim_exec_BW                  1.9430 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18574123 # cumulative IFQ occupancy
IFQ_fcount                  3815371 # cumulative IFQ full count
ifq_occupancy                2.9425 # avg IFQ occupancy (insn's)
ifq_rate                     1.9430 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5144 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6044 # fraction of time (cycle's) IFQ was full
RUU_count                  76542168 # cumulative RUU occupancy
RUU_fcount                  3213465 # cumulative RUU full count
ruu_occupancy               12.1258 # avg RUU occupancy (insn's)
ruu_rate                     1.9430 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2409 # avg RUU occupant latency (cycle's)
ruu_full                     0.5091 # fraction of time (cycle's) RUU was full
LSQ_count                  31963114 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0636 # avg LSQ occupancy (insn's)
lsq_rate                     1.9430 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6061 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122572378 # total number of slip cycles
avg_sim_slip                10.5835 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917918 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput fft 

sim: simulation started @ Thu Dec 15 11:37:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374964 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   8993067 # total simulation time in cycles
sim_IPC                      1.4812 # instructions per cycle
sim_CPI                      0.6751 # cycles per instruction
sim_exec_BW                  1.4873 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35056624 # cumulative IFQ occupancy
IFQ_fcount                  8613511 # cumulative IFQ full count
ifq_occupancy                3.8982 # avg IFQ occupancy (insn's)
ifq_rate                     1.4873 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6211 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9578 # fraction of time (cycle's) IFQ was full
RUU_count                 141134545 # cumulative RUU occupancy
RUU_fcount                  8472629 # cumulative RUU full count
ruu_occupancy               15.6937 # avg RUU occupancy (insn's)
ruu_rate                     1.4873 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.5521 # avg RUU occupant latency (cycle's)
ruu_full                     0.9421 # fraction of time (cycle's) RUU was full
LSQ_count                  73758299 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2017 # avg LSQ occupancy (insn's)
lsq_rate                     1.4873 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.5147 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  234608570 # total number of slip cycles
avg_sim_slip                17.6121 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766206 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput filter 

sim: simulation started @ Thu Dec 15 11:37:19 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12533490 # total simulation time in cycles
sim_IPC                      1.7058 # instructions per cycle
sim_CPI                      0.5862 # cycles per instruction
sim_exec_BW                  1.7114 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48997542 # cumulative IFQ occupancy
IFQ_fcount                 11630213 # cumulative IFQ full count
ifq_occupancy                3.9093 # avg IFQ occupancy (insn's)
ifq_rate                     1.7114 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2843 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199125815 # cumulative RUU occupancy
RUU_fcount                 12410088 # cumulative RUU full count
ruu_occupancy               15.8875 # avg RUU occupancy (insn's)
ruu_rate                     1.7114 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2832 # avg RUU occupant latency (cycle's)
ruu_full                     0.9902 # fraction of time (cycle's) RUU was full
LSQ_count                  63841035 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0936 # avg LSQ occupancy (insn's)
lsq_rate                     1.7114 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9763 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290749771 # total number of slip cycles
avg_sim_slip                13.5996 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886464 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

--------------------------------------------------------------------------------
sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -mem:width 64 -mem:lat 62 2 -mem:minBurstLength 4 -redir:sim tempOutput alphaBlend 

sim: simulation started @ Thu Dec 15 11:37:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim       tempOutput # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         62 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 64 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861091 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  36709163 # total simulation time in cycles
sim_IPC                      0.7589 # instructions per cycle
sim_CPI                      1.3176 # cycles per instruction
sim_exec_BW                  0.7590 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146777385 # cumulative IFQ occupancy
IFQ_fcount                 36694106 # cumulative IFQ full count
ifq_occupancy                3.9984 # avg IFQ occupancy (insn's)
ifq_rate                     0.7590 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2682 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 587111414 # cumulative RUU occupancy
RUU_fcount                 36693410 # cumulative RUU full count
ruu_occupancy               15.9936 # avg RUU occupancy (insn's)
ruu_rate                     0.7590 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.0728 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 178485909 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8622 # avg LSQ occupancy (insn's)
lsq_rate                     0.7590 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4063 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  802106062 # total number of slip cycles
avg_sim_slip                28.7909 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017734 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
