sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:49:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6812901 # total simulation time in cycles
sim_IPC                      1.9239 # instructions per cycle
sim_CPI                      0.5198 # cycles per instruction
sim_exec_BW                  1.9300 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25365952 # cumulative IFQ occupancy
IFQ_fcount                  6215885 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9300 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9291 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104742245 # cumulative RUU occupancy
RUU_fcount                  5652046 # cumulative RUU full count
ruu_occupancy               15.3741 # avg RUU occupancy (insn's)
ruu_rate                     1.9300 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9658 # avg RUU occupant latency (cycle's)
ruu_full                     0.8296 # fraction of time (cycle's) RUU was full
LSQ_count                  32003418 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6975 # avg LSQ occupancy (insn's)
lsq_rate                     1.9300 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4339 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153796793 # total number of slip cycles
avg_sim_slip                11.7338 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:49:51 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264689 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6355913 # total simulation time in cycles
sim_IPC                      1.8222 # instructions per cycle
sim_CPI                      0.5488 # cycles per instruction
sim_exec_BW                  1.9297 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18734699 # cumulative IFQ occupancy
IFQ_fcount                  3855515 # cumulative IFQ full count
ifq_occupancy                2.9476 # avg IFQ occupancy (insn's)
ifq_rate                     1.9297 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5275 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6066 # fraction of time (cycle's) IFQ was full
RUU_count                  77184808 # cumulative RUU occupancy
RUU_fcount                  3253593 # cumulative RUU full count
ruu_occupancy               12.1438 # avg RUU occupancy (insn's)
ruu_rate                     1.9297 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2933 # avg RUU occupant latency (cycle's)
ruu_full                     0.5119 # fraction of time (cycle's) RUU was full
LSQ_count                  32108026 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0517 # avg LSQ occupancy (insn's)
lsq_rate                     1.9297 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6179 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123359930 # total number of slip cycles
avg_sim_slip                10.6515 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:49:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375092 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9293401 # total simulation time in cycles
sim_IPC                      1.4334 # instructions per cycle
sim_CPI                      0.6977 # cycles per instruction
sim_exec_BW                  1.4392 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36212259 # cumulative IFQ occupancy
IFQ_fcount                  8902420 # cumulative IFQ full count
ifq_occupancy                3.8966 # avg IFQ occupancy (insn's)
ifq_rate                     1.4392 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7074 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 145760823 # cumulative RUU occupancy
RUU_fcount                  8761507 # cumulative RUU full count
ruu_occupancy               15.6843 # avg RUU occupancy (insn's)
ruu_rate                     1.4392 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8979 # avg RUU occupant latency (cycle's)
ruu_full                     0.9428 # fraction of time (cycle's) RUU was full
LSQ_count                  76511511 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2329 # avg LSQ occupancy (insn's)
lsq_rate                     1.4392 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7204 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  241987740 # total number of slip cycles
avg_sim_slip                18.1660 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:50:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12569426 # total simulation time in cycles
sim_IPC                      1.7009 # instructions per cycle
sim_CPI                      0.5879 # cycles per instruction
sim_exec_BW                  1.7065 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49130150 # cumulative IFQ occupancy
IFQ_fcount                 11663365 # cumulative IFQ full count
ifq_occupancy                3.9087 # avg IFQ occupancy (insn's)
ifq_rate                     1.7065 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2904 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199657031 # cumulative RUU occupancy
RUU_fcount                 12443240 # cumulative RUU full count
ruu_occupancy               15.8843 # avg RUU occupancy (insn's)
ruu_rate                     1.7065 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3080 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  64007627 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0923 # avg LSQ occupancy (insn's)
lsq_rate                     1.7065 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9840 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291447483 # total number of slip cycles
avg_sim_slip                13.6323 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:50:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861219 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37432827 # total simulation time in cycles
sim_IPC                      0.7443 # instructions per cycle
sim_CPI                      1.3436 # cycles per instruction
sim_exec_BW                  0.7443 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149658665 # cumulative IFQ occupancy
IFQ_fcount                 37414426 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7443 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3716 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 598637222 # cumulative RUU occupancy
RUU_fcount                 37413698 # cumulative RUU full count
ruu_occupancy               15.9923 # avg RUU occupancy (insn's)
ruu_rate                     0.7443 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.4864 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 181968325 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8612 # avg LSQ occupancy (insn's)
lsq_rate                     0.7443 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5312 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  817114222 # total number of slip cycles
avg_sim_slip                29.3296 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:50:46 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149001 # total number of instructions executed
sim_total_refs              4034210 # total number of loads and stores executed
sim_total_loads             3020646 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6760296 # total simulation time in cycles
sim_IPC                      1.9388 # instructions per cycle
sim_CPI                      0.5158 # cycles per instruction
sim_exec_BW                  1.9450 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25170029 # cumulative IFQ occupancy
IFQ_fcount                  6166891 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9450 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9142 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9122 # fraction of time (cycle's) IFQ was full
RUU_count                 103956205 # cumulative RUU occupancy
RUU_fcount                  5603018 # cumulative RUU full count
ruu_occupancy               15.3775 # avg RUU occupancy (insn's)
ruu_rate                     1.9450 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9060 # avg RUU occupant latency (cycle's)
ruu_full                     0.8288 # fraction of time (cycle's) RUU was full
LSQ_count                  31741058 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6952 # avg LSQ occupancy (insn's)
lsq_rate                     1.9450 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4140 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152747629 # total number of slip cycles
avg_sim_slip                11.6538 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189507 # total number of accesses
il1.hits                   13189329 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013166 # total number of accesses
dl1.hits                    4007439 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5247 # total number of hits
ul2.misses                     1150 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1798 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189507 # total number of accesses
itlb.hits                  13189501 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357525 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:50:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264687 # total number of instructions executed
sim_total_refs              4823980 # total number of loads and stores executed
sim_total_loads             2865502 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6283763 # total simulation time in cycles
sim_IPC                      1.8431 # instructions per cycle
sim_CPI                      0.5426 # cycles per instruction
sim_exec_BW                  1.9518 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18465017 # cumulative IFQ occupancy
IFQ_fcount                  3788001 # cumulative IFQ full count
ifq_occupancy                2.9385 # avg IFQ occupancy (insn's)
ifq_rate                     1.9518 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5055 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6028 # fraction of time (cycle's) IFQ was full
RUU_count                  76104790 # cumulative RUU occupancy
RUU_fcount                  3185053 # cumulative RUU full count
ruu_occupancy               12.1113 # avg RUU occupancy (insn's)
ruu_rate                     1.9518 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2052 # avg RUU occupant latency (cycle's)
ruu_full                     0.5069 # fraction of time (cycle's) RUU was full
LSQ_count                  31864820 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0710 # avg LSQ occupancy (insn's)
lsq_rate                     1.9518 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5981 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122035997 # total number of slip cycles
avg_sim_slip                10.5371 # the average slip between issue and retirement
bpred_bimod.lookups         3257723 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442008 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820333 # total number of accesses
il1.hits                   12820116 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497787 # total number of accesses
dl1.hits                    4486582 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      16388 # total number of hits
ul2.misses                     2113 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1142 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820333 # total number of accesses
itlb.hits                  12820326 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514760 # total number of accesses
dtlb.hits                   4514694 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918398 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:51:02 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375528 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390631 # total number of branches executed
sim_cycle                   8801201 # total simulation time in cycles
sim_IPC                      1.5135 # instructions per cycle
sim_CPI                      0.6607 # cycles per instruction
sim_exec_BW                  1.5197 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34308632 # cumulative IFQ occupancy
IFQ_fcount                  8426502 # cumulative IFQ full count
ifq_occupancy                3.8982 # avg IFQ occupancy (insn's)
ifq_rate                     1.5197 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5650 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9574 # fraction of time (cycle's) IFQ was full
RUU_count                 138149454 # cumulative RUU occupancy
RUU_fcount                  8285599 # cumulative RUU full count
ruu_occupancy               15.6967 # avg RUU occupancy (insn's)
ruu_rate                     1.5197 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.3285 # avg RUU occupant latency (cycle's)
ruu_full                     0.9414 # fraction of time (cycle's) RUU was full
LSQ_count                  71971793 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1775 # avg LSQ occupancy (insn's)
lsq_rate                     1.5197 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.3809 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  229835623 # total number of slip cycles
avg_sim_slip                17.2538 # the average slip between issue and retirement
bpred_bimod.lookups          390942 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400102 # total number of accesses
il1.hits                   13399375 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169486 # total number of accesses
dl1.hits                    5881762 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143443 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431894 # total number of accesses
ul2.hits                     415573 # total number of hits
ul2.misses                    16321 # total number of misses
ul2.replacements              12225 # total number of replacements
ul2.writebacks                 9594 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0378 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0222 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400102 # total number of accesses
itlb.hits                  13400083 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766232 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:51:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450128 # total number of instructions executed
sim_total_refs              6589107 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12512109 # total simulation time in cycles
sim_IPC                      1.7087 # instructions per cycle
sim_CPI                      0.5852 # cycles per instruction
sim_exec_BW                  1.7143 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48915612 # cumulative IFQ occupancy
IFQ_fcount                 11609350 # cumulative IFQ full count
ifq_occupancy                3.9095 # avg IFQ occupancy (insn's)
ifq_rate                     1.7143 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2804 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 198785027 # cumulative RUU occupancy
RUU_fcount                 12387679 # cumulative RUU full count
ruu_occupancy               15.8874 # avg RUU occupancy (insn's)
ruu_rate                     1.7143 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2673 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63737014 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0940 # avg LSQ occupancy (insn's)
lsq_rate                     1.7143 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9714 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290304833 # total number of slip cycles
avg_sim_slip                13.5788 # the average slip between issue and retirement
bpred_bimod.lookups         1654487 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483042 # total number of accesses
il1.hits                   21482863 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943699 # total number of accesses
dl1.hits                    4940568 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       2843 # total number of hits
ul2.misses                     1258 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3068 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483042 # total number of accesses
itlb.hits                  21483036 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886422 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:51:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861004 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  36267790 # total simulation time in cycles
sim_IPC                      0.7682 # instructions per cycle
sim_CPI                      1.3018 # cycles per instruction
sim_exec_BW                  0.7682 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145015527 # cumulative IFQ occupancy
IFQ_fcount                 36253643 # cumulative IFQ full count
ifq_occupancy                3.9985 # avg IFQ occupancy (insn's)
ifq_rate                     0.7682 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2050 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 580064419 # cumulative RUU occupancy
RUU_fcount                 36252966 # cumulative RUU full count
ruu_occupancy               15.9939 # avg RUU occupancy (insn's)
ruu_rate                     0.7682 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.8199 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 176369278 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8630 # avg LSQ occupancy (insn's)
lsq_rate                     0.7682 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3303 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  792940983 # total number of slip cycles
avg_sim_slip                28.4619 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6083842 # total number of hits
dl1.misses                  2569556 # total number of misses
dl1.replacements            2568532 # total number of replacements
dl1.writebacks               957260 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2969 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2968 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3527027 # total number of accesses
ul2.hits                    3492843 # total number of hits
ul2.misses                    34184 # total number of misses
ul2.replacements              30088 # total number of replacements
ul2.writebacks                10009 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0097 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0085 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0028 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017732 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:51:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6732020 # total simulation time in cycles
sim_IPC                      1.9470 # instructions per cycle
sim_CPI                      0.5136 # cycles per instruction
sim_exec_BW                  1.9532 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25064594 # cumulative IFQ occupancy
IFQ_fcount                  6140534 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9532 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9062 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9121 # fraction of time (cycle's) IFQ was full
RUU_count                 103536402 # cumulative RUU occupancy
RUU_fcount                  5576629 # cumulative RUU full count
ruu_occupancy               15.3797 # avg RUU occupancy (insn's)
ruu_rate                     1.9532 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8741 # avg RUU occupant latency (cycle's)
ruu_full                     0.8284 # fraction of time (cycle's) RUU was full
LSQ_count                  31601770 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6942 # avg LSQ occupancy (insn's)
lsq_rate                     1.9532 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4034 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152186715 # total number of slip cycles
avg_sim_slip                11.6110 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:51:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264803 # total number of instructions executed
sim_total_refs              4823975 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6247973 # total simulation time in cycles
sim_IPC                      1.8536 # instructions per cycle
sim_CPI                      0.5395 # cycles per instruction
sim_exec_BW                  1.9630 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18330224 # cumulative IFQ occupancy
IFQ_fcount                  3754257 # cumulative IFQ full count
ifq_occupancy                2.9338 # avg IFQ occupancy (insn's)
ifq_rate                     1.9630 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4945 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6009 # fraction of time (cycle's) IFQ was full
RUU_count                  75565322 # cumulative RUU occupancy
RUU_fcount                  3150765 # cumulative RUU full count
ruu_occupancy               12.0944 # avg RUU occupancy (insn's)
ruu_rate                     1.9630 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1612 # avg RUU occupant latency (cycle's)
ruu_full                     0.5043 # fraction of time (cycle's) RUU was full
LSQ_count                  31743793 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0807 # avg LSQ occupancy (insn's)
lsq_rate                     1.9630 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5882 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121375206 # total number of slip cycles
avg_sim_slip                10.4801 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820380 # total number of accesses
il1.hits                   12820163 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17430 # total number of hits
ul2.misses                     1071 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0579 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820380 # total number of accesses
itlb.hits                  12820373 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918584 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:52:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375406 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                   8557278 # total simulation time in cycles
sim_IPC                      1.5567 # instructions per cycle
sim_CPI                      0.6424 # cycles per instruction
sim_exec_BW                  1.5630 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  33365939 # cumulative IFQ occupancy
IFQ_fcount                  8190812 # cumulative IFQ full count
ifq_occupancy                3.8991 # avg IFQ occupancy (insn's)
ifq_rate                     1.5630 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4946 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9572 # fraction of time (cycle's) IFQ was full
RUU_count                 134375219 # cumulative RUU occupancy
RUU_fcount                  8049914 # cumulative RUU full count
ruu_occupancy               15.7030 # avg RUU occupancy (insn's)
ruu_rate                     1.5630 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0464 # avg RUU occupant latency (cycle's)
ruu_full                     0.9407 # fraction of time (cycle's) RUU was full
LSQ_count                  69718400 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1473 # avg LSQ occupancy (insn's)
lsq_rate                     1.5630 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.2124 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  223805396 # total number of slip cycles
avg_sim_slip                16.8011 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400096 # total number of accesses
il1.hits                   13399369 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169507 # total number of accesses
dl1.hits                    5881781 # total number of hits
dl1.misses                   287726 # total number of misses
dl1.replacements             286702 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     423631 # total number of hits
ul2.misses                     8267 # total number of misses
ul2.replacements               6219 # total number of replacements
ul2.writebacks                 4885 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0191 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0144 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0113 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400096 # total number of accesses
itlb.hits                  13400077 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:52:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12482864 # total simulation time in cycles
sim_IPC                      1.7127 # instructions per cycle
sim_CPI                      0.5839 # cycles per instruction
sim_exec_BW                  1.7184 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48805313 # cumulative IFQ occupancy
IFQ_fcount                 11581581 # cumulative IFQ full count
ifq_occupancy                3.9098 # avg IFQ occupancy (insn's)
ifq_rate                     1.7184 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2753 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 198338237 # cumulative RUU occupancy
RUU_fcount                 12359142 # cumulative RUU full count
ruu_occupancy               15.8888 # avg RUU occupancy (insn's)
ruu_rate                     1.7184 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2465 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63597847 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0948 # avg LSQ occupancy (insn's)
lsq_rate                     1.7184 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9649 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289716783 # total number of slip cycles
avg_sim_slip                13.5513 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:52:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861423 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  35894269 # total simulation time in cycles
sim_IPC                      0.7762 # instructions per cycle
sim_CPI                      1.2884 # cycles per instruction
sim_exec_BW                  0.7762 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 143527792 # cumulative IFQ occupancy
IFQ_fcount                 35881641 # cumulative IFQ full count
ifq_occupancy                3.9986 # avg IFQ occupancy (insn's)
ifq_rate                     0.7762 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1515 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 574113145 # cumulative RUU occupancy
RUU_fcount                 35880952 # cumulative RUU full count
ruu_occupancy               15.9946 # avg RUU occupancy (insn's)
ruu_rate                     0.7762 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.6060 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 173150403 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8239 # avg LSQ occupancy (insn's)
lsq_rate                     0.7762 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2147 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  783769233 # total number of slip cycles
avg_sim_slip                28.1327 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              15066 # total number of replacements
ul2.writebacks                 5007 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0041 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0014 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:52:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6732020 # total simulation time in cycles
sim_IPC                      1.9470 # instructions per cycle
sim_CPI                      0.5136 # cycles per instruction
sim_exec_BW                  1.9532 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25064594 # cumulative IFQ occupancy
IFQ_fcount                  6140534 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9532 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9062 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9121 # fraction of time (cycle's) IFQ was full
RUU_count                 103536402 # cumulative RUU occupancy
RUU_fcount                  5576629 # cumulative RUU full count
ruu_occupancy               15.3797 # avg RUU occupancy (insn's)
ruu_rate                     1.9532 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8741 # avg RUU occupant latency (cycle's)
ruu_full                     0.8284 # fraction of time (cycle's) RUU was full
LSQ_count                  31601770 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6942 # avg LSQ occupancy (insn's)
lsq_rate                     1.9532 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4034 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152186715 # total number of slip cycles
avg_sim_slip                11.6110 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:53:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264803 # total number of instructions executed
sim_total_refs              4823975 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6248893 # total simulation time in cycles
sim_IPC                      1.8534 # instructions per cycle
sim_CPI                      0.5396 # cycles per instruction
sim_exec_BW                  1.9627 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18330224 # cumulative IFQ occupancy
IFQ_fcount                  3754257 # cumulative IFQ full count
ifq_occupancy                2.9334 # avg IFQ occupancy (insn's)
ifq_rate                     1.9627 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4945 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6008 # fraction of time (cycle's) IFQ was full
RUU_count                  75565302 # cumulative RUU occupancy
RUU_fcount                  3150765 # cumulative RUU full count
ruu_occupancy               12.0926 # avg RUU occupancy (insn's)
ruu_rate                     1.9627 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1612 # avg RUU occupant latency (cycle's)
ruu_full                     0.5042 # fraction of time (cycle's) RUU was full
LSQ_count                  31743785 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0799 # avg LSQ occupancy (insn's)
lsq_rate                     1.9627 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5882 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121375178 # total number of slip cycles
avg_sim_slip                10.4801 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820381 # total number of accesses
il1.hits                   12820164 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17421 # total number of hits
ul2.misses                     1080 # total number of misses
ul2.replacements                 70 # total number of replacements
ul2.writebacks                   42 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0584 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0038 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820381 # total number of accesses
itlb.hits                  12820374 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918588 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:53:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375415 # total number of instructions executed
sim_total_refs              6748332 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                   9741929 # total simulation time in cycles
sim_IPC                      1.3674 # instructions per cycle
sim_CPI                      0.7313 # cycles per instruction
sim_exec_BW                  1.3730 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  38104851 # cumulative IFQ occupancy
IFQ_fcount                  9375463 # cumulative IFQ full count
ifq_occupancy                3.9114 # avg IFQ occupancy (insn's)
ifq_rate                     1.3730 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8489 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9624 # fraction of time (cycle's) IFQ was full
RUU_count                 153336559 # cumulative RUU occupancy
RUU_fcount                  9234909 # cumulative RUU full count
ruu_occupancy               15.7399 # avg RUU occupancy (insn's)
ruu_rate                     1.3730 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.4641 # avg RUU occupant latency (cycle's)
ruu_full                     0.9480 # fraction of time (cycle's) RUU was full
LSQ_count                  79718843 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1831 # avg LSQ occupancy (insn's)
lsq_rate                     1.3730 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.9601 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  252767157 # total number of slip cycles
avg_sim_slip                18.9752 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400105 # total number of accesses
il1.hits                   13399378 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169520 # total number of accesses
dl1.hits                    5881789 # total number of hits
dl1.misses                   287731 # total number of misses
dl1.replacements             286707 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431903 # total number of accesses
ul2.hits                     404016 # total number of hits
ul2.misses                    27887 # total number of misses
ul2.replacements              26863 # total number of replacements
ul2.writebacks                23516 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0622 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0544 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400105 # total number of accesses
itlb.hits                  13400086 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766256 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:53:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12482864 # total simulation time in cycles
sim_IPC                      1.7127 # instructions per cycle
sim_CPI                      0.5839 # cycles per instruction
sim_exec_BW                  1.7184 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48805313 # cumulative IFQ occupancy
IFQ_fcount                 11581581 # cumulative IFQ full count
ifq_occupancy                3.9098 # avg IFQ occupancy (insn's)
ifq_rate                     1.7184 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2753 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 198338237 # cumulative RUU occupancy
RUU_fcount                 12359142 # cumulative RUU full count
ruu_occupancy               15.8888 # avg RUU occupancy (insn's)
ruu_rate                     1.7184 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2465 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63597847 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0948 # avg LSQ occupancy (insn's)
lsq_rate                     1.7184 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9649 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289716783 # total number of slip cycles
avg_sim_slip                13.5513 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:53:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861423 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  35894269 # total simulation time in cycles
sim_IPC                      0.7762 # instructions per cycle
sim_CPI                      1.2884 # cycles per instruction
sim_exec_BW                  0.7762 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 143527792 # cumulative IFQ occupancy
IFQ_fcount                 35881641 # cumulative IFQ full count
ifq_occupancy                3.9986 # avg IFQ occupancy (insn's)
ifq_rate                     0.7762 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1515 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 574113145 # cumulative RUU occupancy
RUU_fcount                 35880952 # cumulative RUU full count
ruu_occupancy               15.9946 # avg RUU occupancy (insn's)
ruu_rate                     0.7762 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.6060 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 173150403 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8239 # avg LSQ occupancy (insn's)
lsq_rate                     0.7762 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2147 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  783769233 # total number of slip cycles
avg_sim_slip                28.1327 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              16090 # total number of replacements
ul2.writebacks                 5391 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0044 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:53:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6744704 # total simulation time in cycles
sim_IPC                      1.9433 # instructions per cycle
sim_CPI                      0.5146 # cycles per instruction
sim_exec_BW                  1.9495 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25099871 # cumulative IFQ occupancy
IFQ_fcount                  6149368 # cumulative IFQ full count
ifq_occupancy                3.7214 # avg IFQ occupancy (insn's)
ifq_rate                     1.9495 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9089 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9117 # fraction of time (cycle's) IFQ was full
RUU_count                 103679295 # cumulative RUU occupancy
RUU_fcount                  5585450 # cumulative RUU full count
ruu_occupancy               15.3720 # avg RUU occupancy (insn's)
ruu_rate                     1.9495 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8850 # avg RUU occupant latency (cycle's)
ruu_full                     0.8281 # fraction of time (cycle's) RUU was full
LSQ_count                  31646988 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6921 # avg LSQ occupancy (insn's)
lsq_rate                     1.9495 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4068 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152378630 # total number of slip cycles
avg_sim_slip                11.6256 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:54:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6258896 # total simulation time in cycles
sim_IPC                      1.8504 # instructions per cycle
sim_CPI                      0.5404 # cycles per instruction
sim_exec_BW                  1.9596 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18355743 # cumulative IFQ occupancy
IFQ_fcount                  3760606 # cumulative IFQ full count
ifq_occupancy                2.9327 # avg IFQ occupancy (insn's)
ifq_rate                     1.9596 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6008 # fraction of time (cycle's) IFQ was full
RUU_count                  75670927 # cumulative RUU occupancy
RUU_fcount                  3156716 # cumulative RUU full count
ruu_occupancy               12.0901 # avg RUU occupancy (insn's)
ruu_rate                     1.9596 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1697 # avg RUU occupant latency (cycle's)
ruu_full                     0.5044 # fraction of time (cycle's) RUU was full
LSQ_count                  31769517 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0759 # avg LSQ occupancy (insn's)
lsq_rate                     1.9596 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5903 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121506321 # total number of slip cycles
avg_sim_slip                10.4914 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18220 # total number of hits
ul2.misses                      281 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0152 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:54:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13376380 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                   8654168 # total simulation time in cycles
sim_IPC                      1.5392 # instructions per cycle
sim_CPI                      0.6497 # cycles per instruction
sim_exec_BW                  1.5457 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  33694450 # cumulative IFQ occupancy
IFQ_fcount                  8272911 # cumulative IFQ full count
ifq_occupancy                3.8934 # avg IFQ occupancy (insn's)
ifq_rate                     1.5457 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5190 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9559 # fraction of time (cycle's) IFQ was full
RUU_count                 135704547 # cumulative RUU occupancy
RUU_fcount                  8131743 # cumulative RUU full count
ruu_occupancy               15.6808 # avg RUU occupancy (insn's)
ruu_rate                     1.5457 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.1451 # avg RUU occupant latency (cycle's)
ruu_full                     0.9396 # fraction of time (cycle's) RUU was full
LSQ_count                  70464310 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1422 # avg LSQ occupancy (insn's)
lsq_rate                     1.5457 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.2678 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  225874614 # total number of slip cycles
avg_sim_slip                16.9564 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400111 # total number of accesses
il1.hits                   13399384 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169524 # total number of accesses
dl1.hits                    5881797 # total number of hits
dl1.misses                   287727 # total number of misses
dl1.replacements             286703 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     429662 # total number of hits
ul2.misses                     2236 # total number of misses
ul2.replacements               1724 # total number of replacements
ul2.writebacks                 1367 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0052 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0040 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0032 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400111 # total number of accesses
itlb.hits                  13400092 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766276 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:54:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12492277 # total simulation time in cycles
sim_IPC                      1.7114 # instructions per cycle
sim_CPI                      0.5843 # cycles per instruction
sim_exec_BW                  1.7171 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48827248 # cumulative IFQ occupancy
IFQ_fcount                 11586923 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7171 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2763 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 198424992 # cumulative RUU occupancy
RUU_fcount                 12363922 # cumulative RUU full count
ruu_occupancy               15.8838 # avg RUU occupancy (insn's)
ruu_rate                     1.7171 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2505 # avg RUU occupant latency (cycle's)
ruu_full                     0.9897 # fraction of time (cycle's) RUU was full
LSQ_count                  63627462 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0933 # avg LSQ occupancy (insn's)
lsq_rate                     1.7171 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9663 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289835742 # total number of slip cycles
avg_sim_slip                13.5569 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:54:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27862467 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36179429 # total simulation time in cycles
sim_IPC                      0.7700 # instructions per cycle
sim_CPI                      1.2986 # cycles per instruction
sim_exec_BW                  0.7701 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144643255 # cumulative IFQ occupancy
IFQ_fcount                 36160572 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7701 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1913 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 578573509 # cumulative RUU occupancy
RUU_fcount                 36159544 # cumulative RUU full count
ruu_occupancy               15.9918 # avg RUU occupancy (insn's)
ruu_rate                     0.7701 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.7653 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 173424916 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7935 # avg LSQ occupancy (insn's)
lsq_rate                     0.7701 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2243 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  788506931 # total number of slip cycles
avg_sim_slip                28.3028 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               3788 # total number of replacements
ul2.writebacks                 1258 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:55:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6744704 # total simulation time in cycles
sim_IPC                      1.9433 # instructions per cycle
sim_CPI                      0.5146 # cycles per instruction
sim_exec_BW                  1.9495 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25099871 # cumulative IFQ occupancy
IFQ_fcount                  6149368 # cumulative IFQ full count
ifq_occupancy                3.7214 # avg IFQ occupancy (insn's)
ifq_rate                     1.9495 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9089 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9117 # fraction of time (cycle's) IFQ was full
RUU_count                 103679295 # cumulative RUU occupancy
RUU_fcount                  5585450 # cumulative RUU full count
ruu_occupancy               15.3720 # avg RUU occupancy (insn's)
ruu_rate                     1.9495 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8850 # avg RUU occupant latency (cycle's)
ruu_full                     0.8281 # fraction of time (cycle's) RUU was full
LSQ_count                  31646988 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6921 # avg LSQ occupancy (insn's)
lsq_rate                     1.9495 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4068 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152378630 # total number of slip cycles
avg_sim_slip                11.6256 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:55:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6262004 # total simulation time in cycles
sim_IPC                      1.8495 # instructions per cycle
sim_CPI                      0.5407 # cycles per instruction
sim_exec_BW                  1.9586 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18355743 # cumulative IFQ occupancy
IFQ_fcount                  3760606 # cumulative IFQ full count
ifq_occupancy                2.9313 # avg IFQ occupancy (insn's)
ifq_rate                     1.9586 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6005 # fraction of time (cycle's) IFQ was full
RUU_count                  75670927 # cumulative RUU occupancy
RUU_fcount                  3156716 # cumulative RUU full count
ruu_occupancy               12.0841 # avg RUU occupancy (insn's)
ruu_rate                     1.9586 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1697 # avg RUU occupant latency (cycle's)
ruu_full                     0.5041 # fraction of time (cycle's) RUU was full
LSQ_count                  31769517 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0734 # avg LSQ occupancy (insn's)
lsq_rate                     1.9586 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5903 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121506321 # total number of slip cycles
avg_sim_slip                10.4914 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18212 # total number of hits
ul2.misses                      289 # total number of misses
ul2.replacements                 34 # total number of replacements
ul2.writebacks                   21 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0156 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0018 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:55:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13376389 # total number of instructions executed
sim_total_refs              6748338 # total number of loads and stores executed
sim_total_loads             3824313 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                  13116009 # total simulation time in cycles
sim_IPC                      1.0156 # instructions per cycle
sim_CPI                      0.9846 # cycles per instruction
sim_exec_BW                  1.0199 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  51543077 # cumulative IFQ occupancy
IFQ_fcount                 12734752 # cumulative IFQ full count
ifq_occupancy                3.9298 # avg IFQ occupancy (insn's)
ifq_rate                     1.0199 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.8533 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9709 # fraction of time (cycle's) IFQ was full
RUU_count                 207103138 # cumulative RUU occupancy
RUU_fcount                 12594226 # cumulative RUU full count
ruu_occupancy               15.7901 # avg RUU occupancy (insn's)
ruu_rate                     1.0199 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 15.4827 # avg RUU occupant latency (cycle's)
ruu_full                     0.9602 # fraction of time (cycle's) RUU was full
LSQ_count                 108141360 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2450 # avg LSQ occupancy (insn's)
lsq_rate                     1.0199 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  8.0845 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  334950233 # total number of slip cycles
avg_sim_slip                25.1447 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400120 # total number of accesses
il1.hits                   13399393 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169525 # total number of accesses
dl1.hits                    5881793 # total number of hits
dl1.misses                   287732 # total number of misses
dl1.replacements             286708 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431903 # total number of accesses
ul2.hits                     412578 # total number of hits
ul2.misses                    19325 # total number of misses
ul2.replacements              19069 # total number of replacements
ul2.writebacks                17306 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0447 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0442 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0401 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400120 # total number of accesses
itlb.hits                  13400101 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766324 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:55:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12492277 # total simulation time in cycles
sim_IPC                      1.7114 # instructions per cycle
sim_CPI                      0.5843 # cycles per instruction
sim_exec_BW                  1.7171 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48827248 # cumulative IFQ occupancy
IFQ_fcount                 11586923 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7171 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2763 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 198424992 # cumulative RUU occupancy
RUU_fcount                 12363922 # cumulative RUU full count
ruu_occupancy               15.8838 # avg RUU occupancy (insn's)
ruu_rate                     1.7171 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2505 # avg RUU occupant latency (cycle's)
ruu_full                     0.9897 # fraction of time (cycle's) RUU was full
LSQ_count                  63627462 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0933 # avg LSQ occupancy (insn's)
lsq_rate                     1.7171 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9663 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289835742 # total number of slip cycles
avg_sim_slip                13.5569 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:55:42 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 22 # total simulation time in seconds
sim_inst_rate          1266349.6364 # simulation speed (in insts/sec)
sim_total_insn             27862467 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36179429 # total simulation time in cycles
sim_IPC                      0.7700 # instructions per cycle
sim_CPI                      1.2986 # cycles per instruction
sim_exec_BW                  0.7701 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144643255 # cumulative IFQ occupancy
IFQ_fcount                 36160572 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7701 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1913 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 578573509 # cumulative RUU occupancy
RUU_fcount                 36159544 # cumulative RUU full count
ruu_occupancy               15.9918 # avg RUU occupancy (insn's)
ruu_rate                     0.7701 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.7653 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 173424916 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7935 # avg LSQ occupancy (insn's)
lsq_rate                     0.7701 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2243 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  788506931 # total number of slip cycles
avg_sim_slip                28.3028 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               4044 # total number of replacements
ul2.writebacks                 1354 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0011 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:56:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148987 # total number of instructions executed
sim_total_refs              4034199 # total number of loads and stores executed
sim_total_loads             3020642 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010951 # total number of branches executed
sim_cycle                   6747958 # total simulation time in cycles
sim_IPC                      1.9424 # instructions per cycle
sim_CPI                      0.5148 # cycles per instruction
sim_exec_BW                  1.9486 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25108418 # cumulative IFQ occupancy
IFQ_fcount                  6151504 # cumulative IFQ full count
ifq_occupancy                3.7209 # avg IFQ occupancy (insn's)
ifq_rate                     1.9486 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9095 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9116 # fraction of time (cycle's) IFQ was full
RUU_count                 103707953 # cumulative RUU occupancy
RUU_fcount                  5587624 # cumulative RUU full count
ruu_occupancy               15.3688 # avg RUU occupancy (insn's)
ruu_rate                     1.9486 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8871 # avg RUU occupant latency (cycle's)
ruu_full                     0.8280 # fraction of time (cycle's) RUU was full
LSQ_count                  31656528 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6913 # avg LSQ occupancy (insn's)
lsq_rate                     1.9486 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4075 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152414849 # total number of slip cycles
avg_sim_slip                11.6284 # the average slip between issue and retirement
bpred_bimod.lookups         1010971 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000565 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189482 # total number of accesses
il1.hits                   13189304 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013192 # total number of accesses
dl1.hits                    4007465 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6355 # total number of hits
ul2.misses                       42 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189482 # total number of accesses
itlb.hits                  13189476 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357423 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:56:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264941 # total number of instructions executed
sim_total_refs              4823983 # total number of loads and stores executed
sim_total_loads             2865508 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197077 # total number of branches executed
sim_cycle                   6265404 # total simulation time in cycles
sim_IPC                      1.8485 # instructions per cycle
sim_CPI                      0.5410 # cycles per instruction
sim_exec_BW                  1.9576 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18377126 # cumulative IFQ occupancy
IFQ_fcount                  3765938 # cumulative IFQ full count
ifq_occupancy                2.9331 # avg IFQ occupancy (insn's)
ifq_rate                     1.9576 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4983 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6011 # fraction of time (cycle's) IFQ was full
RUU_count                  75750458 # cumulative RUU occupancy
RUU_fcount                  3161936 # cumulative RUU full count
ruu_occupancy               12.0903 # avg RUU occupancy (insn's)
ruu_rate                     1.9576 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1762 # avg RUU occupant latency (cycle's)
ruu_full                     0.5047 # fraction of time (cycle's) RUU was full
LSQ_count                  31799059 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0753 # avg LSQ occupancy (insn's)
lsq_rate                     1.9576 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5927 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121615233 # total number of slip cycles
avg_sim_slip                10.5008 # the average slip between issue and retirement
bpred_bimod.lookups         3257788 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442071 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820458 # total number of accesses
il1.hits                   12820241 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18428 # total number of hits
ul2.misses                       73 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0039 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820458 # total number of accesses
itlb.hits                  12820451 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918912 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:56:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13381852 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                   8945923 # total simulation time in cycles
sim_IPC                      1.4890 # instructions per cycle
sim_CPI                      0.6716 # cycles per instruction
sim_exec_BW                  1.4959 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34720837 # cumulative IFQ occupancy
IFQ_fcount                  8529479 # cumulative IFQ full count
ifq_occupancy                3.8812 # avg IFQ occupancy (insn's)
ifq_rate                     1.4959 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5946 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9534 # fraction of time (cycle's) IFQ was full
RUU_count                 139857876 # cumulative RUU occupancy
RUU_fcount                  8386924 # cumulative RUU full count
ruu_occupancy               15.6337 # avg RUU occupancy (insn's)
ruu_rate                     1.4959 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.4513 # avg RUU occupant latency (cycle's)
ruu_full                     0.9375 # fraction of time (cycle's) RUU was full
LSQ_count                  72457489 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.0995 # avg LSQ occupancy (insn's)
lsq_rate                     1.4959 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.4146 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  231965186 # total number of slip cycles
avg_sim_slip                17.4136 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400110 # total number of accesses
il1.hits                   13399383 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169527 # total number of accesses
dl1.hits                    5881803 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143442 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     431198 # total number of hits
ul2.misses                      695 # total number of misses
ul2.replacements                567 # total number of replacements
ul2.writebacks                  456 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0016 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0013 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400110 # total number of accesses
itlb.hits                  13400091 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736794 # total number of hits
dtlb.misses                    4204 # total number of misses
dtlb.replacements              4076 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766272 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:56:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450132 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939055 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12496511 # total simulation time in cycles
sim_IPC                      1.7108 # instructions per cycle
sim_CPI                      0.5845 # cycles per instruction
sim_exec_BW                  1.7165 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48839715 # cumulative IFQ occupancy
IFQ_fcount                 11590003 # cumulative IFQ full count
ifq_occupancy                3.9083 # avg IFQ occupancy (insn's)
ifq_rate                     1.7165 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2769 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 198467686 # cumulative RUU occupancy
RUU_fcount                 12366857 # cumulative RUU full count
ruu_occupancy               15.8818 # avg RUU occupancy (insn's)
ruu_rate                     1.7165 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2525 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  63640868 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0927 # avg LSQ occupancy (insn's)
lsq_rate                     1.7165 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9669 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289891773 # total number of slip cycles
avg_sim_slip                13.5595 # the average slip between issue and retirement
bpred_bimod.lookups         1654488 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483041 # total number of accesses
il1.hits                   21482862 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944195 # total number of accesses
dl1.hits                    4941064 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       4056 # total number of hits
ul2.misses                       45 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0110 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483041 # total number of accesses
itlb.hits                  21483035 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886420 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:56:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860691 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36259901 # total simulation time in cycles
sim_IPC                      0.7683 # instructions per cycle
sim_CPI                      1.3015 # cycles per instruction
sim_exec_BW                  0.7684 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144942243 # cumulative IFQ occupancy
IFQ_fcount                 36235320 # cumulative IFQ full count
ifq_occupancy                3.9973 # avg IFQ occupancy (insn's)
ifq_rate                     0.7684 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2024 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 579769516 # cumulative RUU occupancy
RUU_fcount                 36234747 # cumulative RUU full count
ruu_occupancy               15.9893 # avg RUU occupancy (insn's)
ruu_rate                     0.7684 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.8096 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 173554183 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7864 # avg LSQ occupancy (insn's)
lsq_rate                     0.7684 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2294 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  789832649 # total number of slip cycles
avg_sim_slip                28.3504 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860774 # total number of accesses
il1.hits                   27860563 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5805632 # total number of hits
dl1.misses                  2847766 # total number of misses
dl1.replacements            2846742 # total number of replacements
dl1.writebacks               953640 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3291 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3290 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3801617 # total number of accesses
ul2.hits                    3800533 # total number of hits
ul2.misses                     1084 # total number of misses
ul2.replacements                956 # total number of replacements
ul2.writebacks                  318 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860774 # total number of accesses
itlb.hits                  27860768 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017678 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:57:08 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6795234 # total simulation time in cycles
sim_IPC                      1.9289 # instructions per cycle
sim_CPI                      0.5184 # cycles per instruction
sim_exec_BW                  1.9350 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25301584 # cumulative IFQ occupancy
IFQ_fcount                  6199793 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9350 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9242 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104484071 # cumulative RUU occupancy
RUU_fcount                  5635954 # cumulative RUU full count
ruu_occupancy               15.3761 # avg RUU occupancy (insn's)
ruu_rate                     1.9350 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9462 # avg RUU occupant latency (cycle's)
ruu_full                     0.8294 # fraction of time (cycle's) RUU was full
LSQ_count                  31917162 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6970 # avg LSQ occupancy (insn's)
lsq_rate                     1.9350 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4273 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153452525 # total number of slip cycles
avg_sim_slip                11.7076 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:57:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264653 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6331415 # total simulation time in cycles
sim_IPC                      1.8292 # instructions per cycle
sim_CPI                      0.5467 # cycles per instruction
sim_exec_BW                  1.9371 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18644375 # cumulative IFQ occupancy
IFQ_fcount                  3832934 # cumulative IFQ full count
ifq_occupancy                2.9447 # avg IFQ occupancy (insn's)
ifq_rate                     1.9371 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5202 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6054 # fraction of time (cycle's) IFQ was full
RUU_count                  76823323 # cumulative RUU occupancy
RUU_fcount                  3231021 # cumulative RUU full count
ruu_occupancy               12.1337 # avg RUU occupancy (insn's)
ruu_rate                     1.9371 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2638 # avg RUU occupant latency (cycle's)
ruu_full                     0.5103 # fraction of time (cycle's) RUU was full
LSQ_count                  32026513 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0583 # avg LSQ occupancy (insn's)
lsq_rate                     1.9371 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6113 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122916932 # total number of slip cycles
avg_sim_slip                10.6132 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:57:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375020 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9124462 # total simulation time in cycles
sim_IPC                      1.4599 # instructions per cycle
sim_CPI                      0.6850 # cycles per instruction
sim_exec_BW                  1.4658 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  35562210 # cumulative IFQ occupancy
IFQ_fcount                  8739908 # cumulative IFQ full count
ifq_occupancy                3.8975 # avg IFQ occupancy (insn's)
ifq_rate                     1.4658 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6589 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9579 # fraction of time (cycle's) IFQ was full
RUU_count                 143158534 # cumulative RUU occupancy
RUU_fcount                  8599012 # cumulative RUU full count
ruu_occupancy               15.6895 # avg RUU occupancy (insn's)
ruu_rate                     1.4658 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.7034 # avg RUU occupant latency (cycle's)
ruu_full                     0.9424 # fraction of time (cycle's) RUU was full
LSQ_count                  74962824 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2156 # avg LSQ occupancy (insn's)
lsq_rate                     1.4658 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6047 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  237836944 # total number of slip cycles
avg_sim_slip                17.8544 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:57:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12549212 # total simulation time in cycles
sim_IPC                      1.7036 # instructions per cycle
sim_CPI                      0.5870 # cycles per instruction
sim_exec_BW                  1.7093 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49055558 # cumulative IFQ occupancy
IFQ_fcount                 11644717 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7093 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2870 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199358222 # cumulative RUU occupancy
RUU_fcount                 12424592 # cumulative RUU full count
ruu_occupancy               15.8861 # avg RUU occupancy (insn's)
ruu_rate                     1.7093 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2940 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63913919 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7093 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9797 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291055020 # total number of slip cycles
avg_sim_slip                13.6139 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:57:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861147 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37025766 # total simulation time in cycles
sim_IPC                      0.7524 # instructions per cycle
sim_CPI                      1.3290 # cycles per instruction
sim_exec_BW                  0.7525 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148037945 # cumulative IFQ occupancy
IFQ_fcount                 37009246 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3134 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 592153955 # cumulative RUU occupancy
RUU_fcount                 37008536 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2538 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180009466 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8617 # avg LSQ occupancy (insn's)
lsq_rate                     0.7525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4609 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  808672132 # total number of slip cycles
avg_sim_slip                29.0266 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:58:11 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149001 # total number of instructions executed
sim_total_refs              4034210 # total number of loads and stores executed
sim_total_loads             3020646 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6751422 # total simulation time in cycles
sim_IPC                      1.9414 # instructions per cycle
sim_CPI                      0.5151 # cycles per instruction
sim_exec_BW                  1.9476 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25138133 # cumulative IFQ occupancy
IFQ_fcount                  6158917 # cumulative IFQ full count
ifq_occupancy                3.7234 # avg IFQ occupancy (insn's)
ifq_rate                     1.9476 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9118 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9122 # fraction of time (cycle's) IFQ was full
RUU_count                 103828414 # cumulative RUU occupancy
RUU_fcount                  5595044 # cumulative RUU full count
ruu_occupancy               15.3787 # avg RUU occupancy (insn's)
ruu_rate                     1.9476 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8963 # avg RUU occupant latency (cycle's)
ruu_full                     0.8287 # fraction of time (cycle's) RUU was full
LSQ_count                  31698308 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6951 # avg LSQ occupancy (insn's)
lsq_rate                     1.9476 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4107 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152577331 # total number of slip cycles
avg_sim_slip                11.6408 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189507 # total number of accesses
il1.hits                   13189329 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013166 # total number of accesses
dl1.hits                    4007439 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5247 # total number of hits
ul2.misses                     1150 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1798 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189507 # total number of accesses
itlb.hits                  13189501 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357525 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:58:19 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264687 # total number of instructions executed
sim_total_refs              4823980 # total number of loads and stores executed
sim_total_loads             2865502 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6271397 # total simulation time in cycles
sim_IPC                      1.8467 # instructions per cycle
sim_CPI                      0.5415 # cycles per instruction
sim_exec_BW                  1.9557 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18419801 # cumulative IFQ occupancy
IFQ_fcount                  3776697 # cumulative IFQ full count
ifq_occupancy                2.9371 # avg IFQ occupancy (insn's)
ifq_rate                     1.9557 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5019 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6022 # fraction of time (cycle's) IFQ was full
RUU_count                  75923701 # cumulative RUU occupancy
RUU_fcount                  3173749 # cumulative RUU full count
ruu_occupancy               12.1063 # avg RUU occupancy (insn's)
ruu_rate                     1.9557 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1904 # avg RUU occupant latency (cycle's)
ruu_full                     0.5061 # fraction of time (cycle's) RUU was full
LSQ_count                  31823942 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0745 # avg LSQ occupancy (insn's)
lsq_rate                     1.9557 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5948 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121814030 # total number of slip cycles
avg_sim_slip                10.5180 # the average slip between issue and retirement
bpred_bimod.lookups         3257723 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442008 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820333 # total number of accesses
il1.hits                   12820116 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497787 # total number of accesses
dl1.hits                    4486582 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      16388 # total number of hits
ul2.misses                     2113 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1142 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820333 # total number of accesses
itlb.hits                  12820326 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514760 # total number of accesses
dtlb.hits                   4514694 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918398 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:58:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 10 # total simulation time in seconds
sim_inst_rate          1332090.8000 # simulation speed (in insts/sec)
sim_total_insn             13375420 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390631 # total number of branches executed
sim_cycle                   8716043 # total simulation time in cycles
sim_IPC                      1.5283 # instructions per cycle
sim_CPI                      0.6543 # cycles per instruction
sim_exec_BW                  1.5346 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  33982220 # cumulative IFQ occupancy
IFQ_fcount                  8344899 # cumulative IFQ full count
ifq_occupancy                3.8988 # avg IFQ occupancy (insn's)
ifq_rate                     1.5346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5406 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9574 # fraction of time (cycle's) IFQ was full
RUU_count                 136841664 # cumulative RUU occupancy
RUU_fcount                  8204005 # cumulative RUU full count
ruu_occupancy               15.7000 # avg RUU occupancy (insn's)
ruu_rate                     1.5346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.2308 # avg RUU occupant latency (cycle's)
ruu_full                     0.9413 # fraction of time (cycle's) RUU was full
LSQ_count                  71194179 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1682 # avg LSQ occupancy (insn's)
lsq_rate                     1.5346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.3228 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  227750489 # total number of slip cycles
avg_sim_slip                17.0972 # the average slip between issue and retirement
bpred_bimod.lookups          390942 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400102 # total number of accesses
il1.hits                   13399375 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169486 # total number of accesses
dl1.hits                    5881762 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143443 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431894 # total number of accesses
ul2.hits                     415573 # total number of hits
ul2.misses                    16321 # total number of misses
ul2.replacements              12225 # total number of replacements
ul2.writebacks                 9594 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0378 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0222 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400102 # total number of accesses
itlb.hits                  13400083 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766232 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:58:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450128 # total number of instructions executed
sim_total_refs              6589107 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12501795 # total simulation time in cycles
sim_IPC                      1.7101 # instructions per cycle
sim_CPI                      0.5848 # cycles per instruction
sim_exec_BW                  1.7158 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48877884 # cumulative IFQ occupancy
IFQ_fcount                 11599918 # cumulative IFQ full count
ifq_occupancy                3.9097 # avg IFQ occupancy (insn's)
ifq_rate                     1.7158 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2787 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 198633881 # cumulative RUU occupancy
RUU_fcount                 12378247 # cumulative RUU full count
ruu_occupancy               15.8884 # avg RUU occupancy (insn's)
ruu_rate                     1.7158 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2603 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63689521 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0944 # avg LSQ occupancy (insn's)
lsq_rate                     1.7158 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9692 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290106248 # total number of slip cycles
avg_sim_slip                13.5695 # the average slip between issue and retirement
bpred_bimod.lookups         1654487 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483042 # total number of accesses
il1.hits                   21482863 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943699 # total number of accesses
dl1.hits                    4940568 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       2843 # total number of hits
ul2.misses                     1258 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3068 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483042 # total number of accesses
itlb.hits                  21483036 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886422 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:58:51 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 22 # total simulation time in seconds
sim_inst_rate          1266349.6364 # simulation speed (in insts/sec)
sim_total_insn             27860968 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  36064039 # total simulation time in cycles
sim_IPC                      0.7725 # instructions per cycle
sim_CPI                      1.2945 # cycles per instruction
sim_exec_BW                  0.7725 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144204915 # cumulative IFQ occupancy
IFQ_fcount                 36050990 # cumulative IFQ full count
ifq_occupancy                3.9986 # avg IFQ occupancy (insn's)
ifq_rate                     0.7725 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1759 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 576821647 # cumulative RUU occupancy
RUU_fcount                 36050322 # cumulative RUU full count
ruu_occupancy               15.9944 # avg RUU occupancy (insn's)
ruu_rate                     0.7725 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.7036 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 175389484 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8633 # avg LSQ occupancy (insn's)
lsq_rate                     0.7725 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2952 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  788718624 # total number of slip cycles
avg_sim_slip                28.3104 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6083842 # total number of hits
dl1.misses                  2569556 # total number of misses
dl1.replacements            2568532 # total number of replacements
dl1.writebacks               957260 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2969 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2968 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3527027 # total number of accesses
ul2.hits                    3492843 # total number of hits
ul2.misses                    34184 # total number of misses
ul2.replacements              30088 # total number of replacements
ul2.writebacks                10009 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0097 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0085 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0028 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017732 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:59:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6754868 # total simulation time in cycles
sim_IPC                      1.9404 # instructions per cycle
sim_CPI                      0.5154 # cycles per instruction
sim_exec_BW                  1.9466 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25144850 # cumulative IFQ occupancy
IFQ_fcount                  6160598 # cumulative IFQ full count
ifq_occupancy                3.7225 # avg IFQ occupancy (insn's)
ifq_rate                     1.9466 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9123 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9120 # fraction of time (cycle's) IFQ was full
RUU_count                 103859346 # cumulative RUU occupancy
RUU_fcount                  5596693 # cumulative RUU full count
ruu_occupancy               15.3755 # avg RUU occupancy (insn's)
ruu_rate                     1.9466 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8986 # avg RUU occupant latency (cycle's)
ruu_full                     0.8285 # fraction of time (cycle's) RUU was full
LSQ_count                  31710394 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6945 # avg LSQ occupancy (insn's)
lsq_rate                     1.9466 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4116 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152616363 # total number of slip cycles
avg_sim_slip                11.6438 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:59:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264803 # total number of instructions executed
sim_total_refs              4823975 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6281477 # total simulation time in cycles
sim_IPC                      1.8438 # instructions per cycle
sim_CPI                      0.5424 # cycles per instruction
sim_exec_BW                  1.9525 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18450992 # cumulative IFQ occupancy
IFQ_fcount                  3784449 # cumulative IFQ full count
ifq_occupancy                2.9374 # avg IFQ occupancy (insn's)
ifq_rate                     1.9525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5044 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6025 # fraction of time (cycle's) IFQ was full
RUU_count                  76049498 # cumulative RUU occupancy
RUU_fcount                  3180957 # cumulative RUU full count
ruu_occupancy               12.1069 # avg RUU occupancy (insn's)
ruu_rate                     1.9525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2006 # avg RUU occupant latency (cycle's)
ruu_full                     0.5064 # fraction of time (cycle's) RUU was full
LSQ_count                  31853425 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0710 # avg LSQ occupancy (insn's)
lsq_rate                     1.9525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5971 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121969014 # total number of slip cycles
avg_sim_slip                10.5314 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820380 # total number of accesses
il1.hits                   12820163 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17430 # total number of hits
ul2.misses                     1071 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0579 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820380 # total number of accesses
itlb.hits                  12820373 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918584 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:59:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375790 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                   8787732 # total simulation time in cycles
sim_IPC                      1.5159 # instructions per cycle
sim_CPI                      0.6597 # cycles per instruction
sim_exec_BW                  1.5221 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34245299 # cumulative IFQ occupancy
IFQ_fcount                  8410652 # cumulative IFQ full count
ifq_occupancy                3.8969 # avg IFQ occupancy (insn's)
ifq_rate                     1.5221 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5602 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9571 # fraction of time (cycle's) IFQ was full
RUU_count                 137898839 # cumulative RUU occupancy
RUU_fcount                  8269706 # cumulative RUU full count
ruu_occupancy               15.6922 # avg RUU occupancy (insn's)
ruu_rate                     1.5221 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.3096 # avg RUU occupant latency (cycle's)
ruu_full                     0.9411 # fraction of time (cycle's) RUU was full
LSQ_count                  71809988 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1716 # avg LSQ occupancy (insn's)
lsq_rate                     1.5221 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.3687 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  229418204 # total number of slip cycles
avg_sim_slip                17.2224 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169507 # total number of accesses
dl1.hits                    5881781 # total number of hits
dl1.misses                   287726 # total number of misses
dl1.replacements             286702 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     423631 # total number of hits
ul2.misses                     8267 # total number of misses
ul2.replacements               6219 # total number of replacements
ul2.writebacks                 4885 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0191 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0144 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0113 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766212 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:59:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12510800 # total simulation time in cycles
sim_IPC                      1.7089 # instructions per cycle
sim_CPI                      0.5852 # cycles per instruction
sim_exec_BW                  1.7145 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48905921 # cumulative IFQ occupancy
IFQ_fcount                 11606733 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7145 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2800 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9277 # fraction of time (cycle's) IFQ was full
RUU_count                 198742013 # cumulative RUU occupancy
RUU_fcount                 12384294 # cumulative RUU full count
ruu_occupancy               15.8856 # avg RUU occupancy (insn's)
ruu_rate                     1.7145 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2653 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  63724807 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0936 # avg LSQ occupancy (insn's)
lsq_rate                     1.7145 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9708 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290246223 # total number of slip cycles
avg_sim_slip                13.5761 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:59:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861807 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  36438253 # total simulation time in cycles
sim_IPC                      0.7646 # instructions per cycle
sim_CPI                      1.3079 # cycles per instruction
sim_exec_BW                  0.7646 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145688896 # cumulative IFQ occupancy
IFQ_fcount                 36421881 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7646 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2290 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 582758281 # cumulative RUU occupancy
RUU_fcount                 36421144 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7646 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9160 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 175762275 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8236 # avg LSQ occupancy (insn's)
lsq_rate                     0.7646 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3084 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  795024417 # total number of slip cycles
avg_sim_slip                28.5367 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              15066 # total number of replacements
ul2.writebacks                 5007 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0041 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0014 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:00:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6754868 # total simulation time in cycles
sim_IPC                      1.9404 # instructions per cycle
sim_CPI                      0.5154 # cycles per instruction
sim_exec_BW                  1.9466 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25144850 # cumulative IFQ occupancy
IFQ_fcount                  6160598 # cumulative IFQ full count
ifq_occupancy                3.7225 # avg IFQ occupancy (insn's)
ifq_rate                     1.9466 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9123 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9120 # fraction of time (cycle's) IFQ was full
RUU_count                 103859346 # cumulative RUU occupancy
RUU_fcount                  5596693 # cumulative RUU full count
ruu_occupancy               15.3755 # avg RUU occupancy (insn's)
ruu_rate                     1.9466 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8986 # avg RUU occupant latency (cycle's)
ruu_full                     0.8285 # fraction of time (cycle's) RUU was full
LSQ_count                  31710394 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6945 # avg LSQ occupancy (insn's)
lsq_rate                     1.9466 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4116 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152616363 # total number of slip cycles
avg_sim_slip                11.6438 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:00:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264803 # total number of instructions executed
sim_total_refs              4823975 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6282829 # total simulation time in cycles
sim_IPC                      1.8434 # instructions per cycle
sim_CPI                      0.5425 # cycles per instruction
sim_exec_BW                  1.9521 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18450992 # cumulative IFQ occupancy
IFQ_fcount                  3784449 # cumulative IFQ full count
ifq_occupancy                2.9367 # avg IFQ occupancy (insn's)
ifq_rate                     1.9521 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5044 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6023 # fraction of time (cycle's) IFQ was full
RUU_count                  76049478 # cumulative RUU occupancy
RUU_fcount                  3180957 # cumulative RUU full count
ruu_occupancy               12.1043 # avg RUU occupancy (insn's)
ruu_rate                     1.9521 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2006 # avg RUU occupant latency (cycle's)
ruu_full                     0.5063 # fraction of time (cycle's) RUU was full
LSQ_count                  31853417 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0699 # avg LSQ occupancy (insn's)
lsq_rate                     1.9521 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5971 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121968986 # total number of slip cycles
avg_sim_slip                10.5314 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820381 # total number of accesses
il1.hits                   12820164 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17421 # total number of hits
ul2.misses                     1080 # total number of misses
ul2.replacements                 70 # total number of replacements
ul2.writebacks                   42 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0584 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0038 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820381 # total number of accesses
itlb.hits                  12820374 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918588 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:00:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375799 # total number of instructions executed
sim_total_refs              6748332 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  10532092 # total simulation time in cycles
sim_IPC                      1.2648 # instructions per cycle
sim_CPI                      0.7906 # cycles per instruction
sim_exec_BW                  1.2700 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  41223238 # cumulative IFQ occupancy
IFQ_fcount                 10155012 # cumulative IFQ full count
ifq_occupancy                3.9141 # avg IFQ occupancy (insn's)
ifq_rate                     1.2700 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0819 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9642 # fraction of time (cycle's) IFQ was full
RUU_count                 165818355 # cumulative RUU occupancy
RUU_fcount                 10014506 # cumulative RUU full count
ruu_occupancy               15.7441 # avg RUU occupancy (insn's)
ruu_rate                     1.2700 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.3969 # avg RUU occupant latency (cycle's)
ruu_full                     0.9509 # fraction of time (cycle's) RUU was full
LSQ_count                  86532754 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2161 # avg LSQ occupancy (insn's)
lsq_rate                     1.2700 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4694 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  272060464 # total number of slip cycles
avg_sim_slip                20.4236 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400106 # total number of accesses
il1.hits                   13399379 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169520 # total number of accesses
dl1.hits                    5881789 # total number of hits
dl1.misses                   287731 # total number of misses
dl1.replacements             286707 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431903 # total number of accesses
ul2.hits                     404016 # total number of hits
ul2.misses                    27887 # total number of misses
ul2.replacements              26863 # total number of replacements
ul2.writebacks                23516 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0622 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0544 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400106 # total number of accesses
itlb.hits                  13400087 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766260 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:00:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12510800 # total simulation time in cycles
sim_IPC                      1.7089 # instructions per cycle
sim_CPI                      0.5852 # cycles per instruction
sim_exec_BW                  1.7145 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48905921 # cumulative IFQ occupancy
IFQ_fcount                 11606733 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7145 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2800 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9277 # fraction of time (cycle's) IFQ was full
RUU_count                 198742013 # cumulative RUU occupancy
RUU_fcount                 12384294 # cumulative RUU full count
ruu_occupancy               15.8856 # avg RUU occupancy (insn's)
ruu_rate                     1.7145 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2653 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  63724807 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0936 # avg LSQ occupancy (insn's)
lsq_rate                     1.7145 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9708 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290246223 # total number of slip cycles
avg_sim_slip                13.5761 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:00:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 22 # total simulation time in seconds
sim_inst_rate          1266349.6364 # simulation speed (in insts/sec)
sim_total_insn             27861807 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  36438253 # total simulation time in cycles
sim_IPC                      0.7646 # instructions per cycle
sim_CPI                      1.3079 # cycles per instruction
sim_exec_BW                  0.7646 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145688896 # cumulative IFQ occupancy
IFQ_fcount                 36421881 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7646 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2290 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 582758281 # cumulative RUU occupancy
RUU_fcount                 36421144 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7646 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9160 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 175762275 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8236 # avg LSQ occupancy (insn's)
lsq_rate                     0.7646 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3084 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  795024417 # total number of slip cycles
avg_sim_slip                28.5367 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              16090 # total number of replacements
ul2.writebacks                 5391 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0044 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:01:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6771200 # total simulation time in cycles
sim_IPC                      1.9357 # instructions per cycle
sim_CPI                      0.5166 # cycles per instruction
sim_exec_BW                  1.9419 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25188959 # cumulative IFQ occupancy
IFQ_fcount                  6171640 # cumulative IFQ full count
ifq_occupancy                3.7200 # avg IFQ occupancy (insn's)
ifq_rate                     1.9419 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9157 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9115 # fraction of time (cycle's) IFQ was full
RUU_count                 104038335 # cumulative RUU occupancy
RUU_fcount                  5607722 # cumulative RUU full count
ruu_occupancy               15.3648 # avg RUU occupancy (insn's)
ruu_rate                     1.9419 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9123 # avg RUU occupant latency (cycle's)
ruu_full                     0.8282 # fraction of time (cycle's) RUU was full
LSQ_count                  31766412 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6914 # avg LSQ occupancy (insn's)
lsq_rate                     1.9419 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4159 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152857094 # total number of slip cycles
avg_sim_slip                11.6621 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:01:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6294416 # total simulation time in cycles
sim_IPC                      1.8400 # instructions per cycle
sim_CPI                      0.5435 # cycles per instruction
sim_exec_BW                  1.9485 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18477855 # cumulative IFQ occupancy
IFQ_fcount                  3791134 # cumulative IFQ full count
ifq_occupancy                2.9356 # avg IFQ occupancy (insn's)
ifq_rate                     1.9485 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5066 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6023 # fraction of time (cycle's) IFQ was full
RUU_count                  76162063 # cumulative RUU occupancy
RUU_fcount                  3187244 # cumulative RUU full count
ruu_occupancy               12.0999 # avg RUU occupancy (insn's)
ruu_rate                     1.9485 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2098 # avg RUU occupant latency (cycle's)
ruu_full                     0.5064 # fraction of time (cycle's) RUU was full
LSQ_count                  31881645 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0651 # avg LSQ occupancy (insn's)
lsq_rate                     1.9485 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5994 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122109585 # total number of slip cycles
avg_sim_slip                10.5435 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18220 # total number of hits
ul2.misses                      281 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0152 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:01:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13377148 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                   8906648 # total simulation time in cycles
sim_IPC                      1.4956 # instructions per cycle
sim_CPI                      0.6686 # cycles per instruction
sim_exec_BW                  1.5019 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34639858 # cumulative IFQ occupancy
IFQ_fcount                  8509263 # cumulative IFQ full count
ifq_occupancy                3.8892 # avg IFQ occupancy (insn's)
ifq_rate                     1.5019 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5895 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9554 # fraction of time (cycle's) IFQ was full
RUU_count                 139498083 # cumulative RUU occupancy
RUU_fcount                  8367903 # cumulative RUU full count
ruu_occupancy               15.6622 # avg RUU occupancy (insn's)
ruu_rate                     1.5019 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.4281 # avg RUU occupant latency (cycle's)
ruu_full                     0.9395 # fraction of time (cycle's) RUU was full
LSQ_count                  72697156 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1621 # avg LSQ occupancy (insn's)
lsq_rate                     1.5019 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.4344 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  231896196 # total number of slip cycles
avg_sim_slip                17.4084 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400111 # total number of accesses
il1.hits                   13399384 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169524 # total number of accesses
dl1.hits                    5881797 # total number of hits
dl1.misses                   287727 # total number of misses
dl1.replacements             286703 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     429662 # total number of hits
ul2.misses                     2236 # total number of misses
ul2.replacements               1724 # total number of replacements
ul2.writebacks                 1367 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0052 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0040 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0032 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400111 # total number of accesses
itlb.hits                  13400092 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766276 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:01:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12521845 # total simulation time in cycles
sim_IPC                      1.7074 # instructions per cycle
sim_CPI                      0.5857 # cycles per instruction
sim_exec_BW                  1.7130 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48928624 # cumulative IFQ occupancy
IFQ_fcount                 11612267 # cumulative IFQ full count
ifq_occupancy                3.9075 # avg IFQ occupancy (insn's)
ifq_rate                     1.7130 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2810 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9274 # fraction of time (cycle's) IFQ was full
RUU_count                 198833184 # cumulative RUU occupancy
RUU_fcount                 12389266 # cumulative RUU full count
ruu_occupancy               15.8789 # avg RUU occupancy (insn's)
ruu_rate                     1.7130 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2696 # avg RUU occupant latency (cycle's)
ruu_full                     0.9894 # fraction of time (cycle's) RUU was full
LSQ_count                  63756678 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0916 # avg LSQ occupancy (insn's)
lsq_rate                     1.7130 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9723 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290373150 # total number of slip cycles
avg_sim_slip                13.5820 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:01:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27863235 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36726629 # total simulation time in cycles
sim_IPC                      0.7586 # instructions per cycle
sim_CPI                      1.3183 # cycles per instruction
sim_exec_BW                  0.7587 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146807479 # cumulative IFQ occupancy
IFQ_fcount                 36701628 # cumulative IFQ full count
ifq_occupancy                3.9973 # avg IFQ occupancy (insn's)
ifq_rate                     0.7587 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2689 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 587230597 # cumulative RUU occupancy
RUU_fcount                 36700408 # cumulative RUU full count
ruu_occupancy               15.9892 # avg RUU occupancy (insn's)
ruu_rate                     0.7587 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.0755 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 176039956 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7933 # avg LSQ occupancy (insn's)
lsq_rate                     0.7587 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3180 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  799778867 # total number of slip cycles
avg_sim_slip                28.7074 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               3788 # total number of replacements
ul2.writebacks                 1258 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:02:21 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6771200 # total simulation time in cycles
sim_IPC                      1.9357 # instructions per cycle
sim_CPI                      0.5166 # cycles per instruction
sim_exec_BW                  1.9419 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25188959 # cumulative IFQ occupancy
IFQ_fcount                  6171640 # cumulative IFQ full count
ifq_occupancy                3.7200 # avg IFQ occupancy (insn's)
ifq_rate                     1.9419 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9157 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9115 # fraction of time (cycle's) IFQ was full
RUU_count                 104038335 # cumulative RUU occupancy
RUU_fcount                  5607722 # cumulative RUU full count
ruu_occupancy               15.3648 # avg RUU occupancy (insn's)
ruu_rate                     1.9419 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9123 # avg RUU occupant latency (cycle's)
ruu_full                     0.8282 # fraction of time (cycle's) RUU was full
LSQ_count                  31766412 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6914 # avg LSQ occupancy (insn's)
lsq_rate                     1.9419 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4159 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152857094 # total number of slip cycles
avg_sim_slip                11.6621 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:02:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6298868 # total simulation time in cycles
sim_IPC                      1.8387 # instructions per cycle
sim_CPI                      0.5439 # cycles per instruction
sim_exec_BW                  1.9472 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18477855 # cumulative IFQ occupancy
IFQ_fcount                  3791134 # cumulative IFQ full count
ifq_occupancy                2.9335 # avg IFQ occupancy (insn's)
ifq_rate                     1.9472 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5066 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6019 # fraction of time (cycle's) IFQ was full
RUU_count                  76162063 # cumulative RUU occupancy
RUU_fcount                  3187244 # cumulative RUU full count
ruu_occupancy               12.0914 # avg RUU occupancy (insn's)
ruu_rate                     1.9472 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2098 # avg RUU occupant latency (cycle's)
ruu_full                     0.5060 # fraction of time (cycle's) RUU was full
LSQ_count                  31881645 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0615 # avg LSQ occupancy (insn's)
lsq_rate                     1.9472 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5994 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122109585 # total number of slip cycles
avg_sim_slip                10.5435 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18212 # total number of hits
ul2.misses                      289 # total number of misses
ul2.replacements                 34 # total number of replacements
ul2.writebacks                   21 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0156 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0018 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:02:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate           951493.4286 # simulation speed (in insts/sec)
sim_total_insn             13377157 # total number of instructions executed
sim_total_refs              6748338 # total number of loads and stores executed
sim_total_loads             3824313 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                  15299651 # total simulation time in cycles
sim_IPC                      0.8707 # instructions per cycle
sim_CPI                      1.1485 # cycles per instruction
sim_exec_BW                  0.8743 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  60213709 # cumulative IFQ occupancy
IFQ_fcount                 14902266 # cumulative IFQ full count
ifq_occupancy                3.9356 # avg IFQ occupancy (insn's)
ifq_rate                     0.8743 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  4.5012 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9740 # fraction of time (cycle's) IFQ was full
RUU_count                 241798338 # cumulative RUU occupancy
RUU_fcount                 14761740 # cumulative RUU full count
ruu_occupancy               15.8042 # avg RUU occupancy (insn's)
ruu_rate                     0.8743 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 18.0755 # avg RUU occupant latency (cycle's)
ruu_full                     0.9648 # fraction of time (cycle's) RUU was full
LSQ_count                 126679107 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2799 # avg LSQ occupancy (insn's)
lsq_rate                     0.8743 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  9.4698 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  388178380 # total number of slip cycles
avg_sim_slip                29.1405 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400120 # total number of accesses
il1.hits                   13399393 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169525 # total number of accesses
dl1.hits                    5881792 # total number of hits
dl1.misses                   287733 # total number of misses
dl1.replacements             286709 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431904 # total number of accesses
ul2.hits                     412579 # total number of hits
ul2.misses                    19325 # total number of misses
ul2.replacements              19069 # total number of replacements
ul2.writebacks                17306 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0447 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0442 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0401 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400120 # total number of accesses
itlb.hits                  13400101 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766324 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:02:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12521845 # total simulation time in cycles
sim_IPC                      1.7074 # instructions per cycle
sim_CPI                      0.5857 # cycles per instruction
sim_exec_BW                  1.7130 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48928624 # cumulative IFQ occupancy
IFQ_fcount                 11612267 # cumulative IFQ full count
ifq_occupancy                3.9075 # avg IFQ occupancy (insn's)
ifq_rate                     1.7130 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2810 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9274 # fraction of time (cycle's) IFQ was full
RUU_count                 198833184 # cumulative RUU occupancy
RUU_fcount                 12389266 # cumulative RUU full count
ruu_occupancy               15.8789 # avg RUU occupancy (insn's)
ruu_rate                     1.7130 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2696 # avg RUU occupant latency (cycle's)
ruu_full                     0.9894 # fraction of time (cycle's) RUU was full
LSQ_count                  63756678 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0916 # avg LSQ occupancy (insn's)
lsq_rate                     1.7130 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9723 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290373150 # total number of slip cycles
avg_sim_slip                13.5820 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:03:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 22 # total simulation time in seconds
sim_inst_rate          1266349.6364 # simulation speed (in insts/sec)
sim_total_insn             27863235 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36726629 # total simulation time in cycles
sim_IPC                      0.7586 # instructions per cycle
sim_CPI                      1.3183 # cycles per instruction
sim_exec_BW                  0.7587 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146807479 # cumulative IFQ occupancy
IFQ_fcount                 36701628 # cumulative IFQ full count
ifq_occupancy                3.9973 # avg IFQ occupancy (insn's)
ifq_rate                     0.7587 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2689 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 587230597 # cumulative RUU occupancy
RUU_fcount                 36700408 # cumulative RUU full count
ruu_occupancy               15.9892 # avg RUU occupancy (insn's)
ruu_rate                     0.7587 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.0755 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 176039956 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7933 # avg LSQ occupancy (insn's)
lsq_rate                     0.7587 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3180 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  799778867 # total number of slip cycles
avg_sim_slip                28.7074 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               4044 # total number of replacements
ul2.writebacks                 1354 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0011 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:03:25 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148987 # total number of instructions executed
sim_total_refs              4034199 # total number of loads and stores executed
sim_total_loads             3020642 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010951 # total number of branches executed
sim_cycle                   6776003 # total simulation time in cycles
sim_IPC                      1.9343 # instructions per cycle
sim_CPI                      0.5170 # cycles per instruction
sim_exec_BW                  1.9405 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25202167 # cumulative IFQ occupancy
IFQ_fcount                  6174941 # cumulative IFQ full count
ifq_occupancy                3.7193 # avg IFQ occupancy (insn's)
ifq_rate                     1.9405 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9167 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9113 # fraction of time (cycle's) IFQ was full
RUU_count                 104083265 # cumulative RUU occupancy
RUU_fcount                  5611063 # cumulative RUU full count
ruu_occupancy               15.3606 # avg RUU occupancy (insn's)
ruu_rate                     1.9405 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9157 # avg RUU occupant latency (cycle's)
ruu_full                     0.8281 # fraction of time (cycle's) RUU was full
LSQ_count                  31781069 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6902 # avg LSQ occupancy (insn's)
lsq_rate                     1.9405 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4170 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152913166 # total number of slip cycles
avg_sim_slip                11.6664 # the average slip between issue and retirement
bpred_bimod.lookups         1010971 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000565 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189482 # total number of accesses
il1.hits                   13189304 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013192 # total number of accesses
dl1.hits                    4007465 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6355 # total number of hits
ul2.misses                       42 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189482 # total number of accesses
itlb.hits                  13189476 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357423 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:03:34 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264941 # total number of instructions executed
sim_total_refs              4823983 # total number of loads and stores executed
sim_total_loads             2865508 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197077 # total number of branches executed
sim_cycle                   6303515 # total simulation time in cycles
sim_IPC                      1.8373 # instructions per cycle
sim_CPI                      0.5443 # cycles per instruction
sim_exec_BW                  1.9457 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18508066 # cumulative IFQ occupancy
IFQ_fcount                  3798673 # cumulative IFQ full count
ifq_occupancy                2.9362 # avg IFQ occupancy (insn's)
ifq_rate                     1.9457 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5090 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6026 # fraction of time (cycle's) IFQ was full
RUU_count                  76274479 # cumulative RUU occupancy
RUU_fcount                  3194672 # cumulative RUU full count
ruu_occupancy               12.1003 # avg RUU occupancy (insn's)
ruu_rate                     1.9457 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2189 # avg RUU occupant latency (cycle's)
ruu_full                     0.5068 # fraction of time (cycle's) RUU was full
LSQ_count                  31923900 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0645 # avg LSQ occupancy (insn's)
lsq_rate                     1.9457 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6029 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122264095 # total number of slip cycles
avg_sim_slip                10.5568 # the average slip between issue and retirement
bpred_bimod.lookups         3257788 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442071 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820458 # total number of accesses
il1.hits                   12820241 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18428 # total number of hits
ul2.misses                       73 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0039 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820458 # total number of accesses
itlb.hits                  12820451 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918912 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:03:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13384924 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                   9318011 # total simulation time in cycles
sim_IPC                      1.4296 # instructions per cycle
sim_CPI                      0.6995 # cycles per instruction
sim_exec_BW                  1.4365 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36086309 # cumulative IFQ occupancy
IFQ_fcount                  8870847 # cumulative IFQ full count
ifq_occupancy                3.8727 # avg IFQ occupancy (insn's)
ifq_rate                     1.4365 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.6960 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9520 # fraction of time (cycle's) IFQ was full
RUU_count                 145353924 # cumulative RUU occupancy
RUU_fcount                  8727525 # cumulative RUU full count
ruu_occupancy               15.5992 # avg RUU occupancy (insn's)
ruu_rate                     1.4365 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.8595 # avg RUU occupant latency (cycle's)
ruu_full                     0.9366 # fraction of time (cycle's) RUU was full
LSQ_count                  75496770 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1022 # avg LSQ occupancy (insn's)
lsq_rate                     1.4365 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.6404 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  240472099 # total number of slip cycles
avg_sim_slip                18.0522 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400110 # total number of accesses
il1.hits                   13399383 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169527 # total number of accesses
dl1.hits                    5881803 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143442 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     431198 # total number of hits
ul2.misses                      695 # total number of misses
ul2.replacements                567 # total number of replacements
ul2.writebacks                  456 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0016 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0013 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400110 # total number of accesses
itlb.hits                  13400091 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736794 # total number of hits
dtlb.misses                    4204 # total number of misses
dtlb.replacements              4076 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766272 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:03:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450132 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939055 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12527624 # total simulation time in cycles
sim_IPC                      1.7066 # instructions per cycle
sim_CPI                      0.5860 # cycles per instruction
sim_exec_BW                  1.7122 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48945735 # cumulative IFQ occupancy
IFQ_fcount                 11616508 # cumulative IFQ full count
ifq_occupancy                3.9070 # avg IFQ occupancy (insn's)
ifq_rate                     1.7122 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2818 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9273 # fraction of time (cycle's) IFQ was full
RUU_count                 198892027 # cumulative RUU occupancy
RUU_fcount                 12393363 # cumulative RUU full count
ruu_occupancy               15.8763 # avg RUU occupancy (insn's)
ruu_rate                     1.7122 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2723 # avg RUU occupant latency (cycle's)
ruu_full                     0.9893 # fraction of time (cycle's) RUU was full
LSQ_count                  63774581 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0907 # avg LSQ occupancy (insn's)
lsq_rate                     1.7122 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9732 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290449827 # total number of slip cycles
avg_sim_slip                13.5856 # the average slip between issue and retirement
bpred_bimod.lookups         1654488 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483041 # total number of accesses
il1.hits                   21482862 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944195 # total number of accesses
dl1.hits                    4941064 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       4056 # total number of hits
ul2.misses                       45 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0110 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483041 # total number of accesses
itlb.hits                  21483035 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886420 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:04:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860691 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36812093 # total simulation time in cycles
sim_IPC                      0.7568 # instructions per cycle
sim_CPI                      1.3213 # cycles per instruction
sim_exec_BW                  0.7568 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 147117255 # cumulative IFQ occupancy
IFQ_fcount                 36779064 # cumulative IFQ full count
ifq_occupancy                3.9964 # avg IFQ occupancy (insn's)
ifq_rate                     0.7568 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2805 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9991 # fraction of time (cycle's) IFQ was full
RUU_count                 588469535 # cumulative RUU occupancy
RUU_fcount                 36778504 # cumulative RUU full count
ruu_occupancy               15.9858 # avg RUU occupancy (insn's)
ruu_rate                     0.7568 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.1219 # avg RUU occupant latency (cycle's)
ruu_full                     0.9991 # fraction of time (cycle's) RUU was full
LSQ_count                 176192857 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7863 # avg LSQ occupancy (insn's)
lsq_rate                     0.7568 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3241 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  801171342 # total number of slip cycles
avg_sim_slip                28.7574 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860774 # total number of accesses
il1.hits                   27860563 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5805632 # total number of hits
dl1.misses                  2847766 # total number of misses
dl1.replacements            2846742 # total number of replacements
dl1.writebacks               953640 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3291 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3290 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3801617 # total number of accesses
ul2.hits                    3800533 # total number of hits
ul2.misses                     1084 # total number of misses
ul2.replacements                956 # total number of replacements
ul2.writebacks                  318 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860774 # total number of accesses
itlb.hits                  27860768 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017678 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:04:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6822716 # total simulation time in cycles
sim_IPC                      1.9211 # instructions per cycle
sim_CPI                      0.5205 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25401712 # cumulative IFQ occupancy
IFQ_fcount                  6224825 # cumulative IFQ full count
ifq_occupancy                3.7231 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9318 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104885675 # cumulative RUU occupancy
RUU_fcount                  5660986 # cumulative RUU full count
ruu_occupancy               15.3730 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9767 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32051338 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6977 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4376 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153988053 # total number of slip cycles
avg_sim_slip                11.7484 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:04:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264709 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6369523 # total simulation time in cycles
sim_IPC                      1.8183 # instructions per cycle
sim_CPI                      0.5500 # cycles per instruction
sim_exec_BW                  1.9255 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18784879 # cumulative IFQ occupancy
IFQ_fcount                  3868060 # cumulative IFQ full count
ifq_occupancy                2.9492 # avg IFQ occupancy (insn's)
ifq_rate                     1.9255 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5316 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6073 # fraction of time (cycle's) IFQ was full
RUU_count                  77385633 # cumulative RUU occupancy
RUU_fcount                  3266133 # cumulative RUU full count
ruu_occupancy               12.1494 # avg RUU occupancy (insn's)
ruu_rate                     1.9255 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3096 # avg RUU occupant latency (cycle's)
ruu_full                     0.5128 # fraction of time (cycle's) RUU was full
LSQ_count                  32153311 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0480 # avg LSQ occupancy (insn's)
lsq_rate                     1.9255 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6216 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123606040 # total number of slip cycles
avg_sim_slip                10.6727 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:04:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375132 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9387256 # total simulation time in cycles
sim_IPC                      1.4190 # instructions per cycle
sim_CPI                      0.7047 # cycles per instruction
sim_exec_BW                  1.4248 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36573399 # cumulative IFQ occupancy
IFQ_fcount                  8992705 # cumulative IFQ full count
ifq_occupancy                3.8961 # avg IFQ occupancy (insn's)
ifq_rate                     1.4248 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7344 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 147206538 # cumulative RUU occupancy
RUU_fcount                  8851782 # cumulative RUU full count
ruu_occupancy               15.6815 # avg RUU occupancy (insn's)
ruu_rate                     1.4248 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0060 # avg RUU occupant latency (cycle's)
ruu_full                     0.9430 # fraction of time (cycle's) RUU was full
LSQ_count                  77371891 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2422 # avg LSQ occupancy (insn's)
lsq_rate                     1.4248 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7848 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  244293735 # total number of slip cycles
avg_sim_slip                18.3391 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:04:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12580656 # total simulation time in cycles
sim_IPC                      1.6994 # instructions per cycle
sim_CPI                      0.5885 # cycles per instruction
sim_exec_BW                  1.7050 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49171590 # cumulative IFQ occupancy
IFQ_fcount                 11673725 # cumulative IFQ full count
ifq_occupancy                3.9085 # avg IFQ occupancy (insn's)
ifq_rate                     1.7050 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2924 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199823036 # cumulative RUU occupancy
RUU_fcount                 12453600 # cumulative RUU full count
ruu_occupancy               15.8834 # avg RUU occupancy (insn's)
ruu_rate                     1.7050 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3157 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64059687 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0919 # avg LSQ occupancy (insn's)
lsq_rate                     1.7050 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9864 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291665518 # total number of slip cycles
avg_sim_slip                13.6425 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:1024:64:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:05:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861259 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37658972 # total simulation time in cycles
sim_IPC                      0.7398 # instructions per cycle
sim_CPI                      1.3517 # cycles per instruction
sim_exec_BW                  0.7398 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150559065 # cumulative IFQ occupancy
IFQ_fcount                 37639526 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7398 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4039 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 602239037 # cumulative RUU occupancy
RUU_fcount                 37638788 # cumulative RUU full count
ruu_occupancy               15.9919 # avg RUU occupancy (insn's)
ruu_rate                     0.7398 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6156 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 183056580 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8609 # avg LSQ occupancy (insn's)
lsq_rate                     0.7398 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5703 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  821804272 # total number of slip cycles
avg_sim_slip                29.4980 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:05:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149001 # total number of instructions executed
sim_total_refs              4034210 # total number of loads and stores executed
sim_total_loads             3020646 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6835232 # total simulation time in cycles
sim_IPC                      1.9176 # instructions per cycle
sim_CPI                      0.5215 # cycles per instruction
sim_exec_BW                  1.9237 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25439373 # cumulative IFQ occupancy
IFQ_fcount                  6234227 # cumulative IFQ full count
ifq_occupancy                3.7218 # avg IFQ occupancy (insn's)
ifq_rate                     1.9237 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9347 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9121 # fraction of time (cycle's) IFQ was full
RUU_count                 105035329 # cumulative RUU occupancy
RUU_fcount                  5670354 # cumulative RUU full count
ruu_occupancy               15.3668 # avg RUU occupancy (insn's)
ruu_rate                     1.9237 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9881 # avg RUU occupant latency (cycle's)
ruu_full                     0.8296 # fraction of time (cycle's) RUU was full
LSQ_count                  32102058 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6966 # avg LSQ occupancy (insn's)
lsq_rate                     1.9237 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4414 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154185701 # total number of slip cycles
avg_sim_slip                11.7635 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189507 # total number of accesses
il1.hits                   13189329 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013166 # total number of accesses
dl1.hits                    4007439 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5247 # total number of hits
ul2.misses                     1150 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1798 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189507 # total number of accesses
itlb.hits                  13189501 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357525 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:05:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264688 # total number of instructions executed
sim_total_refs              4823981 # total number of loads and stores executed
sim_total_loads             2865502 # total number of loads executed
sim_total_stores       1958479.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6388214 # total simulation time in cycles
sim_IPC                      1.8130 # instructions per cycle
sim_CPI                      0.5516 # cycles per instruction
sim_exec_BW                  1.9199 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18846949 # cumulative IFQ occupancy
IFQ_fcount                  3883484 # cumulative IFQ full count
ifq_occupancy                2.9503 # avg IFQ occupancy (insn's)
ifq_rate                     1.9199 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5367 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6079 # fraction of time (cycle's) IFQ was full
RUU_count                  77634410 # cumulative RUU occupancy
RUU_fcount                  3280536 # cumulative RUU full count
ruu_occupancy               12.1528 # avg RUU occupancy (insn's)
ruu_rate                     1.9199 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3299 # avg RUU occupant latency (cycle's)
ruu_full                     0.5135 # fraction of time (cycle's) RUU was full
LSQ_count                  32210389 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0422 # avg LSQ occupancy (insn's)
lsq_rate                     1.9199 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6263 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123911186 # total number of slip cycles
avg_sim_slip                10.6990 # the average slip between issue and retirement
bpred_bimod.lookups         3257723 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442008 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820334 # total number of accesses
il1.hits                   12820117 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497787 # total number of accesses
dl1.hits                    4486582 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      16388 # total number of hits
ul2.misses                     2113 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1142 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820334 # total number of accesses
itlb.hits                  12820327 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514760 # total number of accesses
dtlb.hits                   4514694 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918402 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:05:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13376440 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390631 # total number of branches executed
sim_cycle                   9520442 # total simulation time in cycles
sim_IPC                      1.3992 # instructions per cycle
sim_CPI                      0.7147 # cycles per instruction
sim_exec_BW                  1.4050 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37065516 # cumulative IFQ occupancy
IFQ_fcount                  9115723 # cumulative IFQ full count
ifq_occupancy                3.8933 # avg IFQ occupancy (insn's)
ifq_rate                     1.4050 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7710 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9575 # fraction of time (cycle's) IFQ was full
RUU_count                 149195078 # cumulative RUU occupancy
RUU_fcount                  8974744 # cumulative RUU full count
ruu_occupancy               15.6710 # avg RUU occupancy (insn's)
ruu_rate                     1.4050 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.1536 # avg RUU occupant latency (cycle's)
ruu_full                     0.9427 # fraction of time (cycle's) RUU was full
LSQ_count                  78539071 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2495 # avg LSQ occupancy (insn's)
lsq_rate                     1.4050 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.8714 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  247446245 # total number of slip cycles
avg_sim_slip                18.5758 # the average slip between issue and retirement
bpred_bimod.lookups          390942 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400102 # total number of accesses
il1.hits                   13399375 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169486 # total number of accesses
dl1.hits                    5881762 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143443 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431894 # total number of accesses
ul2.hits                     415573 # total number of hits
ul2.misses                    16321 # total number of misses
ul2.replacements              12225 # total number of replacements
ul2.writebacks                 9594 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0378 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0222 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400102 # total number of accesses
itlb.hits                  13400083 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766232 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:06:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450128 # total number of instructions executed
sim_total_refs              6589107 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12599205 # total simulation time in cycles
sim_IPC                      1.6969 # instructions per cycle
sim_CPI                      0.5893 # cycles per instruction
sim_exec_BW                  1.7025 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49234204 # cumulative IFQ occupancy
IFQ_fcount                 11688998 # cumulative IFQ full count
ifq_occupancy                3.9077 # avg IFQ occupancy (insn's)
ifq_rate                     1.7025 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2953 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 200061371 # cumulative RUU occupancy
RUU_fcount                 12467327 # cumulative RUU full count
ruu_occupancy               15.8789 # avg RUU occupancy (insn's)
ruu_rate                     1.7025 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3268 # avg RUU occupant latency (cycle's)
ruu_full                     0.9895 # fraction of time (cycle's) RUU was full
LSQ_count                  64138066 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0906 # avg LSQ occupancy (insn's)
lsq_rate                     1.7025 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9901 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291981773 # total number of slip cycles
avg_sim_slip                13.6572 # the average slip between issue and retirement
bpred_bimod.lookups         1654487 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483042 # total number of accesses
il1.hits                   21482863 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943699 # total number of accesses
dl1.hits                    4940568 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       2843 # total number of hits
ul2.misses                     1258 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3068 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483042 # total number of accesses
itlb.hits                  21483036 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886422 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:512:128:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:06:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:512:128:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861308 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  37988354 # total simulation time in cycles
sim_IPC                      0.7334 # instructions per cycle
sim_CPI                      1.3636 # cycles per instruction
sim_exec_BW                  0.7334 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 151860695 # cumulative IFQ occupancy
IFQ_fcount                 37964935 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7334 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4506 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 607447827 # cumulative RUU occupancy
RUU_fcount                 37964182 # cumulative RUU full count
ruu_occupancy               15.9904 # avg RUU occupancy (insn's)
ruu_rate                     0.7334 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.8026 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 184643094 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8605 # avg LSQ occupancy (insn's)
lsq_rate                     0.7334 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6272 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  828596459 # total number of slip cycles
avg_sim_slip                29.7418 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6083842 # total number of hits
dl1.misses                  2569556 # total number of misses
dl1.replacements            2568532 # total number of replacements
dl1.writebacks               957260 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2969 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2968 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3527027 # total number of accesses
ul2.hits                    3492843 # total number of hits
ul2.misses                    34184 # total number of misses
ul2.replacements              30088 # total number of replacements
ul2.writebacks                10009 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0097 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0085 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0028 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017732 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:06:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6835788 # total simulation time in cycles
sim_IPC                      1.9174 # instructions per cycle
sim_CPI                      0.5215 # cycles per instruction
sim_exec_BW                  1.9236 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25429090 # cumulative IFQ occupancy
IFQ_fcount                  6231658 # cumulative IFQ full count
ifq_occupancy                3.7200 # avg IFQ occupancy (insn's)
ifq_rate                     1.9236 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9339 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9116 # fraction of time (cycle's) IFQ was full
RUU_count                 105003106 # cumulative RUU occupancy
RUU_fcount                  5667753 # cumulative RUU full count
ruu_occupancy               15.3608 # avg RUU occupancy (insn's)
ruu_rate                     1.9236 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9856 # avg RUU occupant latency (cycle's)
ruu_full                     0.8291 # fraction of time (cycle's) RUU was full
LSQ_count                  32095104 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6952 # avg LSQ occupancy (insn's)
lsq_rate                     1.9236 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4409 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154138033 # total number of slip cycles
avg_sim_slip                11.7599 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:06:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264802 # total number of instructions executed
sim_total_refs              4823974 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958474.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6400255 # total simulation time in cycles
sim_IPC                      1.8095 # instructions per cycle
sim_CPI                      0.5526 # cycles per instruction
sim_exec_BW                  1.9163 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18879184 # cumulative IFQ occupancy
IFQ_fcount                  3891497 # cumulative IFQ full count
ifq_occupancy                2.9498 # avg IFQ occupancy (insn's)
ifq_rate                     1.9163 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5393 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6080 # fraction of time (cycle's) IFQ was full
RUU_count                  77766177 # cumulative RUU occupancy
RUU_fcount                  3288005 # cumulative RUU full count
ruu_occupancy               12.1505 # avg RUU occupancy (insn's)
ruu_rate                     1.9163 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3406 # avg RUU occupant latency (cycle's)
ruu_full                     0.5137 # fraction of time (cycle's) RUU was full
LSQ_count                  32243126 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0378 # avg LSQ occupancy (insn's)
lsq_rate                     1.9163 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6289 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124075398 # total number of slip cycles
avg_sim_slip                10.7132 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820379 # total number of accesses
il1.hits                   12820162 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17430 # total number of hits
ul2.misses                     1071 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0579 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820379 # total number of accesses
itlb.hits                  12820372 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918580 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:06:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13377150 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824305 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                   9604196 # total simulation time in cycles
sim_IPC                      1.3870 # instructions per cycle
sim_CPI                      0.7210 # cycles per instruction
sim_exec_BW                  1.3928 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37360195 # cumulative IFQ occupancy
IFQ_fcount                  9189376 # cumulative IFQ full count
ifq_occupancy                3.8900 # avg IFQ occupancy (insn's)
ifq_rate                     1.3928 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7928 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9568 # fraction of time (cycle's) IFQ was full
RUU_count                 150380351 # cumulative RUU occupancy
RUU_fcount                  9048260 # cumulative RUU full count
ruu_occupancy               15.6578 # avg RUU occupancy (insn's)
ruu_rate                     1.3928 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.2416 # avg RUU occupant latency (cycle's)
ruu_full                     0.9421 # fraction of time (cycle's) RUU was full
LSQ_count                  79218662 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2483 # avg LSQ occupancy (insn's)
lsq_rate                     1.3928 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.9219 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  249299890 # total number of slip cycles
avg_sim_slip                18.7149 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169507 # total number of accesses
dl1.hits                    5881781 # total number of hits
dl1.misses                   287726 # total number of misses
dl1.replacements             286702 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     423631 # total number of hits
ul2.misses                     8267 # total number of misses
ul2.replacements               6219 # total number of replacements
ul2.writebacks                 4885 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0191 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0144 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0113 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766212 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:07:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12609740 # total simulation time in cycles
sim_IPC                      1.6955 # instructions per cycle
sim_CPI                      0.5898 # cycles per instruction
sim_exec_BW                  1.7011 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49262241 # cumulative IFQ occupancy
IFQ_fcount                 11695813 # cumulative IFQ full count
ifq_occupancy                3.9067 # avg IFQ occupancy (insn's)
ifq_rate                     1.7011 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 200172053 # cumulative RUU occupancy
RUU_fcount                 12473374 # cumulative RUU full count
ruu_occupancy               15.8744 # avg RUU occupancy (insn's)
ruu_rate                     1.7011 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3320 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64174457 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0893 # avg LSQ occupancy (insn's)
lsq_rate                     1.7011 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9918 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292121323 # total number of slip cycles
avg_sim_slip                13.6638 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:07:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27863167 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  38364863 # total simulation time in cycles
sim_IPC                      0.7262 # instructions per cycle
sim_CPI                      1.3771 # cycles per instruction
sim_exec_BW                  0.7263 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 153342806 # cumulative IFQ occupancy
IFQ_fcount                 38335231 # cumulative IFQ full count
ifq_occupancy                3.9970 # avg IFQ occupancy (insn's)
ifq_rate                     0.7263 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5034 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 613376471 # cumulative RUU occupancy
RUU_fcount                 38334324 # cumulative RUU full count
ruu_occupancy               15.9880 # avg RUU occupancy (insn's)
ruu_rate                     0.7263 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.0139 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 185012655 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8225 # avg LSQ occupancy (insn's)
lsq_rate                     0.7263 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6400 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  834886527 # total number of slip cycles
avg_sim_slip                29.9675 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              15066 # total number of replacements
ul2.writebacks                 5007 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0041 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0014 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:07:41 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149022 # total number of instructions executed
sim_total_refs              4034214 # total number of loads and stores executed
sim_total_loads             3020650 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   6835788 # total simulation time in cycles
sim_IPC                      1.9174 # instructions per cycle
sim_CPI                      0.5215 # cycles per instruction
sim_exec_BW                  1.9236 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25429090 # cumulative IFQ occupancy
IFQ_fcount                  6231658 # cumulative IFQ full count
ifq_occupancy                3.7200 # avg IFQ occupancy (insn's)
ifq_rate                     1.9236 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9339 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9116 # fraction of time (cycle's) IFQ was full
RUU_count                 105003106 # cumulative RUU occupancy
RUU_fcount                  5667753 # cumulative RUU full count
ruu_occupancy               15.3608 # avg RUU occupancy (insn's)
ruu_rate                     1.9236 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9856 # avg RUU occupant latency (cycle's)
ruu_full                     0.8291 # fraction of time (cycle's) RUU was full
LSQ_count                  32095104 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6952 # avg LSQ occupancy (insn's)
lsq_rate                     1.9236 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4409 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154138033 # total number of slip cycles
avg_sim_slip                11.7599 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189528 # total number of accesses
il1.hits                   13189350 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013184 # total number of accesses
dl1.hits                    4007457 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       5811 # total number of hits
ul2.misses                      586 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0916 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189528 # total number of accesses
itlb.hits                  13189522 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357617 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:07:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264802 # total number of instructions executed
sim_total_refs              4823974 # total number of loads and stores executed
sim_total_loads             2865500 # total number of loads executed
sim_total_stores       1958474.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6403137 # total simulation time in cycles
sim_IPC                      1.8087 # instructions per cycle
sim_CPI                      0.5529 # cycles per instruction
sim_exec_BW                  1.9154 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18879184 # cumulative IFQ occupancy
IFQ_fcount                  3891497 # cumulative IFQ full count
ifq_occupancy                2.9484 # avg IFQ occupancy (insn's)
ifq_rate                     1.9154 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5393 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6077 # fraction of time (cycle's) IFQ was full
RUU_count                  77766157 # cumulative RUU occupancy
RUU_fcount                  3288005 # cumulative RUU full count
ruu_occupancy               12.1450 # avg RUU occupancy (insn's)
ruu_rate                     1.9154 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3406 # avg RUU occupant latency (cycle's)
ruu_full                     0.5135 # fraction of time (cycle's) RUU was full
LSQ_count                  32243118 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0355 # avg LSQ occupancy (insn's)
lsq_rate                     1.9154 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6289 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124075370 # total number of slip cycles
avg_sim_slip                10.7132 # the average slip between issue and retirement
bpred_bimod.lookups         3257753 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442040 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820380 # total number of accesses
il1.hits                   12820163 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497786 # total number of accesses
dl1.hits                    4486581 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      17421 # total number of hits
ul2.misses                     1080 # total number of misses
ul2.replacements                 70 # total number of replacements
ul2.writebacks                   42 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0584 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0038 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820380 # total number of accesses
itlb.hits                  12820373 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918584 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:07:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13377159 # total number of instructions executed
sim_total_refs              6748332 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  13330926 # total simulation time in cycles
sim_IPC                      0.9992 # instructions per cycle
sim_CPI                      1.0008 # cycles per instruction
sim_exec_BW                  1.0035 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  52268294 # cumulative IFQ occupancy
IFQ_fcount                 12916106 # cumulative IFQ full count
ifq_occupancy                3.9208 # avg IFQ occupancy (insn's)
ifq_rate                     1.0035 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.9073 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9689 # fraction of time (cycle's) IFQ was full
RUU_count                 210027817 # cumulative RUU occupancy
RUU_fcount                 12775770 # cumulative RUU full count
ruu_occupancy               15.7549 # avg RUU occupancy (insn's)
ruu_rate                     1.0035 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 15.7005 # avg RUU occupant latency (cycle's)
ruu_full                     0.9584 # fraction of time (cycle's) RUU was full
LSQ_count                 110667048 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3015 # avg LSQ occupancy (insn's)
lsq_rate                     1.0035 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  8.2728 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  340395720 # total number of slip cycles
avg_sim_slip                25.5535 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380511 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89851 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90457 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89851 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400106 # total number of accesses
il1.hits                   13399379 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169520 # total number of accesses
dl1.hits                    5881789 # total number of hits
dl1.misses                   287731 # total number of misses
dl1.replacements             286707 # total number of replacements
dl1.writebacks               143445 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431903 # total number of accesses
ul2.hits                     404016 # total number of hits
ul2.misses                    27887 # total number of misses
ul2.replacements              26863 # total number of replacements
ul2.writebacks                23516 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0622 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0544 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400106 # total number of accesses
itlb.hits                  13400087 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736809 # total number of hits
dtlb.misses                    4189 # total number of misses
dtlb.replacements              4061 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766260 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:08:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450141 # total number of instructions executed
sim_total_refs              6589113 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12609740 # total simulation time in cycles
sim_IPC                      1.6955 # instructions per cycle
sim_CPI                      0.5898 # cycles per instruction
sim_exec_BW                  1.7011 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49262241 # cumulative IFQ occupancy
IFQ_fcount                 11695813 # cumulative IFQ full count
ifq_occupancy                3.9067 # avg IFQ occupancy (insn's)
ifq_rate                     1.7011 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2966 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 200172053 # cumulative RUU occupancy
RUU_fcount                 12473374 # cumulative RUU full count
ruu_occupancy               15.8744 # avg RUU occupancy (insn's)
ruu_rate                     1.7011 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3320 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  64174457 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0893 # avg LSQ occupancy (insn's)
lsq_rate                     1.7011 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9918 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292121323 # total number of slip cycles
avg_sim_slip                13.6638 # the average slip between issue and retirement
bpred_bimod.lookups         1654491 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483056 # total number of accesses
il1.hits                   21482877 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943957 # total number of accesses
dl1.hits                    4940826 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3460 # total number of hits
ul2.misses                      641 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1563 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483056 # total number of accesses
itlb.hits                  21483050 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886482 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:256:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:08:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:256:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27863167 # total number of instructions executed
sim_total_refs              8653608 # total number of loads and stores executed
sim_total_loads             7203833 # total number of loads executed
sim_total_stores       1449775.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  38364863 # total simulation time in cycles
sim_IPC                      0.7262 # instructions per cycle
sim_CPI                      1.3771 # cycles per instruction
sim_exec_BW                  0.7263 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 153342806 # cumulative IFQ occupancy
IFQ_fcount                 38335231 # cumulative IFQ full count
ifq_occupancy                3.9970 # avg IFQ occupancy (insn's)
ifq_rate                     0.7263 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5034 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 613376471 # cumulative RUU occupancy
RUU_fcount                 38334324 # cumulative RUU full count
ruu_occupancy               15.9880 # avg RUU occupancy (insn's)
ruu_rate                     0.7263 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.0139 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 185012655 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8225 # avg LSQ occupancy (insn's)
lsq_rate                     0.7263 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6400 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  834886527 # total number of slip cycles
avg_sim_slip                29.9675 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860802 # total number of accesses
il1.hits                   27860591 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5940463 # total number of hits
dl1.misses                  2712938 # total number of misses
dl1.replacements            2711914 # total number of replacements
dl1.writebacks               955399 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3135 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3134 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3668548 # total number of accesses
ul2.hits                    3651434 # total number of hits
ul2.misses                    17114 # total number of misses
ul2.replacements              16090 # total number of replacements
ul2.writebacks                 5391 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0047 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0044 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860802 # total number of accesses
itlb.hits                  27860796 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017796 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:08:47 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6865040 # total simulation time in cycles
sim_IPC                      1.9093 # instructions per cycle
sim_CPI                      0.5238 # cycles per instruction
sim_exec_BW                  1.9154 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25504479 # cumulative IFQ occupancy
IFQ_fcount                  6250520 # cumulative IFQ full count
ifq_occupancy                3.7151 # avg IFQ occupancy (insn's)
ifq_rate                     1.9154 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9397 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9105 # fraction of time (cycle's) IFQ was full
RUU_count                 105309935 # cumulative RUU occupancy
RUU_fcount                  5686602 # cumulative RUU full count
ruu_occupancy               15.3400 # avg RUU occupancy (insn's)
ruu_rate                     1.9154 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0090 # avg RUU occupant latency (cycle's)
ruu_full                     0.8283 # fraction of time (cycle's) RUU was full
LSQ_count                  32189372 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6889 # avg LSQ occupancy (insn's)
lsq_rate                     1.9154 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4480 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154551654 # total number of slip cycles
avg_sim_slip                11.7914 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:08:55 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6420820 # total simulation time in cycles
sim_IPC                      1.8037 # instructions per cycle
sim_CPI                      0.5544 # cycles per instruction
sim_exec_BW                  1.9102 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18912751 # cumulative IFQ occupancy
IFQ_fcount                  3899858 # cumulative IFQ full count
ifq_occupancy                2.9455 # avg IFQ occupancy (insn's)
ifq_rate                     1.9102 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5420 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6074 # fraction of time (cycle's) IFQ was full
RUU_count                  77911167 # cumulative RUU occupancy
RUU_fcount                  3295968 # cumulative RUU full count
ruu_occupancy               12.1341 # avg RUU occupancy (insn's)
ruu_rate                     1.9102 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3524 # avg RUU occupant latency (cycle's)
ruu_full                     0.5133 # fraction of time (cycle's) RUU was full
LSQ_count                  32287831 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0286 # avg LSQ occupancy (insn's)
lsq_rate                     1.9102 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6325 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124264875 # total number of slip cycles
avg_sim_slip                10.7296 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18220 # total number of hits
ul2.misses                      281 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0152 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:09:02 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13379868 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                   9801392 # total simulation time in cycles
sim_IPC                      1.3591 # instructions per cycle
sim_CPI                      0.7358 # cycles per instruction
sim_exec_BW                  1.3651 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  37990354 # cumulative IFQ occupancy
IFQ_fcount                  9346887 # cumulative IFQ full count
ifq_occupancy                3.8760 # avg IFQ occupancy (insn's)
ifq_rate                     1.3651 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8394 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9536 # fraction of time (cycle's) IFQ was full
RUU_count                 152942227 # cumulative RUU occupancy
RUU_fcount                  9204847 # cumulative RUU full count
ruu_occupancy               15.6041 # avg RUU occupancy (insn's)
ruu_rate                     1.3651 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.4308 # avg RUU occupant latency (cycle's)
ruu_full                     0.9391 # fraction of time (cycle's) RUU was full
LSQ_count                  80609650 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2243 # avg LSQ occupancy (insn's)
lsq_rate                     1.3651 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0247 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  253235834 # total number of slip cycles
avg_sim_slip                19.0104 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400111 # total number of accesses
il1.hits                   13399384 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169524 # total number of accesses
dl1.hits                    5881797 # total number of hits
dl1.misses                   287727 # total number of misses
dl1.replacements             286703 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431898 # total number of accesses
ul2.hits                     429662 # total number of hits
ul2.misses                     2236 # total number of misses
ul2.replacements               1724 # total number of replacements
ul2.writebacks                 1367 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0052 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0040 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0032 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400111 # total number of accesses
itlb.hits                  13400092 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766276 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:09:14 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12626565 # total simulation time in cycles
sim_IPC                      1.6932 # instructions per cycle
sim_CPI                      0.5906 # cycles per instruction
sim_exec_BW                  1.6988 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49287664 # cumulative IFQ occupancy
IFQ_fcount                 11702027 # cumulative IFQ full count
ifq_occupancy                3.9035 # avg IFQ occupancy (insn's)
ifq_rate                     1.6988 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2978 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9268 # fraction of time (cycle's) IFQ was full
RUU_count                 200278864 # cumulative RUU occupancy
RUU_fcount                 12479026 # cumulative RUU full count
ruu_occupancy               15.8617 # avg RUU occupancy (insn's)
ruu_rate                     1.6988 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3370 # avg RUU occupant latency (cycle's)
ruu_full                     0.9883 # fraction of time (cycle's) RUU was full
LSQ_count                  64214318 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0857 # avg LSQ occupancy (insn's)
lsq_rate                     1.6988 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9937 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292276470 # total number of slip cycles
avg_sim_slip                13.6710 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:09:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27865955 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  38664629 # total simulation time in cycles
sim_IPC                      0.7205 # instructions per cycle
sim_CPI                      1.3878 # cycles per instruction
sim_exec_BW                  0.7207 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154472439 # cumulative IFQ occupancy
IFQ_fcount                 38617868 # cumulative IFQ full count
ifq_occupancy                3.9952 # avg IFQ occupancy (insn's)
ifq_rate                     0.7207 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5434 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9988 # fraction of time (cycle's) IFQ was full
RUU_count                 617891117 # cumulative RUU occupancy
RUU_fcount                 38615968 # cumulative RUU full count
ruu_occupancy               15.9808 # avg RUU occupancy (insn's)
ruu_rate                     0.7207 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1737 # avg RUU occupant latency (cycle's)
ruu_full                     0.9987 # fraction of time (cycle's) RUU was full
LSQ_count                 185301556 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7925 # avg LSQ occupancy (insn's)
lsq_rate                     0.7207 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6497 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  839700307 # total number of slip cycles
avg_sim_slip                30.1403 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               3788 # total number of replacements
ul2.writebacks                 1258 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:09:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148995 # total number of instructions executed
sim_total_refs              4034201 # total number of loads and stores executed
sim_total_loads             3020644 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6865040 # total simulation time in cycles
sim_IPC                      1.9093 # instructions per cycle
sim_CPI                      0.5238 # cycles per instruction
sim_exec_BW                  1.9154 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25504479 # cumulative IFQ occupancy
IFQ_fcount                  6250520 # cumulative IFQ full count
ifq_occupancy                3.7151 # avg IFQ occupancy (insn's)
ifq_rate                     1.9154 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9397 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9105 # fraction of time (cycle's) IFQ was full
RUU_count                 105309935 # cumulative RUU occupancy
RUU_fcount                  5686602 # cumulative RUU full count
ruu_occupancy               15.3400 # avg RUU occupancy (insn's)
ruu_rate                     1.9154 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0090 # avg RUU occupant latency (cycle's)
ruu_full                     0.8283 # fraction of time (cycle's) RUU was full
LSQ_count                  32189372 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6889 # avg LSQ occupancy (insn's)
lsq_rate                     1.9154 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4480 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154551654 # total number of slip cycles
avg_sim_slip                11.7914 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189495 # total number of accesses
il1.hits                   13189317 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013188 # total number of accesses
dl1.hits                    4007461 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6239 # total number of hits
ul2.misses                      158 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0247 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189495 # total number of accesses
itlb.hits                  13189489 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357475 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:09:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264893 # total number of instructions executed
sim_total_refs              4823978 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6430032 # total simulation time in cycles
sim_IPC                      1.8012 # instructions per cycle
sim_CPI                      0.5552 # cycles per instruction
sim_exec_BW                  1.9074 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18912751 # cumulative IFQ occupancy
IFQ_fcount                  3899858 # cumulative IFQ full count
ifq_occupancy                2.9413 # avg IFQ occupancy (insn's)
ifq_rate                     1.9074 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5420 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6065 # fraction of time (cycle's) IFQ was full
RUU_count                  77911167 # cumulative RUU occupancy
RUU_fcount                  3295968 # cumulative RUU full count
ruu_occupancy               12.1168 # avg RUU occupancy (insn's)
ruu_rate                     1.9074 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3524 # avg RUU occupant latency (cycle's)
ruu_full                     0.5126 # fraction of time (cycle's) RUU was full
LSQ_count                  32287831 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0214 # avg LSQ occupancy (insn's)
lsq_rate                     1.9074 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6325 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124264875 # total number of slip cycles
avg_sim_slip                10.7296 # the average slip between issue and retirement
bpred_bimod.lookups         3257776 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442060 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497783 # total number of accesses
dl1.hits                    4486578 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18212 # total number of hits
ul2.misses                      289 # total number of misses
ul2.replacements                 34 # total number of replacements
ul2.writebacks                   21 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0156 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0018 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514758 # total number of accesses
dtlb.hits                   4514692 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918790 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:10:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 16 # total simulation time in seconds
sim_inst_rate           832556.7500 # simulation speed (in insts/sec)
sim_total_insn             13379877 # total number of instructions executed
sim_total_refs              6748338 # total number of loads and stores executed
sim_total_loads             3824313 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                  23034515 # total simulation time in cycles
sim_IPC                      0.5783 # instructions per cycle
sim_CPI                      1.7292 # cycles per instruction
sim_exec_BW                  0.5809 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  90926725 # cumulative IFQ occupancy
IFQ_fcount                 22580010 # cumulative IFQ full count
ifq_occupancy                3.9474 # avg IFQ occupancy (insn's)
ifq_rate                     0.5809 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  6.7958 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9803 # fraction of time (cycle's) IFQ was full
RUU_count                 364695282 # cumulative RUU occupancy
RUU_fcount                 22439484 # cumulative RUU full count
ruu_occupancy               15.8326 # avg RUU occupancy (insn's)
ruu_rate                     0.5809 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 27.2570 # avg RUU occupant latency (cycle's)
ruu_full                     0.9742 # fraction of time (cycle's) RUU was full
LSQ_count                 192344001 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3503 # avg LSQ occupancy (insn's)
lsq_rate                     0.5809 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 14.3756 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  576723218 # total number of slip cycles
avg_sim_slip                43.2946 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400120 # total number of accesses
il1.hits                   13399393 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169525 # total number of accesses
dl1.hits                    5881792 # total number of hits
dl1.misses                   287733 # total number of misses
dl1.replacements             286709 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431904 # total number of accesses
ul2.hits                     412579 # total number of hits
ul2.misses                    19325 # total number of misses
ul2.replacements              19069 # total number of replacements
ul2.writebacks                17306 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0447 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0442 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0401 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400120 # total number of accesses
itlb.hits                  13400101 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736797 # total number of hits
dtlb.misses                    4201 # total number of misses
dtlb.replacements              4073 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766324 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:10:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450120 # total number of instructions executed
sim_total_refs              6589103 # total number of loads and stores executed
sim_total_loads             4939054 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12626565 # total simulation time in cycles
sim_IPC                      1.6932 # instructions per cycle
sim_CPI                      0.5906 # cycles per instruction
sim_exec_BW                  1.6988 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49287664 # cumulative IFQ occupancy
IFQ_fcount                 11702027 # cumulative IFQ full count
ifq_occupancy                3.9035 # avg IFQ occupancy (insn's)
ifq_rate                     1.6988 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2978 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9268 # fraction of time (cycle's) IFQ was full
RUU_count                 200278864 # cumulative RUU occupancy
RUU_fcount                 12479026 # cumulative RUU full count
ruu_occupancy               15.8617 # avg RUU occupancy (insn's)
ruu_rate                     1.6988 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3370 # avg RUU occupant latency (cycle's)
ruu_full                     0.9883 # fraction of time (cycle's) RUU was full
LSQ_count                  64214318 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0857 # avg LSQ occupancy (insn's)
lsq_rate                     1.6988 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9937 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292276470 # total number of slip cycles
avg_sim_slip                13.6710 # the average slip between issue and retirement
bpred_bimod.lookups         1654485 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483029 # total number of accesses
il1.hits                   21482850 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944146 # total number of accesses
dl1.hits                    4941015 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       3929 # total number of hits
ul2.misses                      172 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0419 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483029 # total number of accesses
itlb.hits                  21483023 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886370 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:1024:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:10:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:1024:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27865955 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  38664629 # total simulation time in cycles
sim_IPC                      0.7205 # instructions per cycle
sim_CPI                      1.3878 # cycles per instruction
sim_exec_BW                  0.7207 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154472439 # cumulative IFQ occupancy
IFQ_fcount                 38617868 # cumulative IFQ full count
ifq_occupancy                3.9952 # avg IFQ occupancy (insn's)
ifq_rate                     0.7207 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5434 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9988 # fraction of time (cycle's) IFQ was full
RUU_count                 617891117 # cumulative RUU occupancy
RUU_fcount                 38615968 # cumulative RUU full count
ruu_occupancy               15.9808 # avg RUU occupancy (insn's)
ruu_rate                     0.7207 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1737 # avg RUU occupant latency (cycle's)
ruu_full                     0.9987 # fraction of time (cycle's) RUU was full
LSQ_count                 185301556 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7925 # avg LSQ occupancy (insn's)
lsq_rate                     0.7207 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6497 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  839700307 # total number of slip cycles
avg_sim_slip                30.1403 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860776 # total number of accesses
il1.hits                   27860565 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5831566 # total number of hits
dl1.misses                  2821832 # total number of misses
dl1.replacements            2820808 # total number of replacements
dl1.writebacks               953977 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3261 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3260 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3776020 # total number of accesses
ul2.hits                    3771720 # total number of hits
ul2.misses                     4300 # total number of misses
ul2.replacements               4044 # total number of replacements
ul2.writebacks                 1354 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0011 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0011 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860776 # total number of accesses
itlb.hits                  27860770 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017686 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 13:10:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148987 # total number of instructions executed
sim_total_refs              4034199 # total number of loads and stores executed
sim_total_loads             3020642 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010951 # total number of branches executed
sim_cycle                   6876643 # total simulation time in cycles
sim_IPC                      1.9060 # instructions per cycle
sim_CPI                      0.5246 # cycles per instruction
sim_exec_BW                  1.9121 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25539447 # cumulative IFQ occupancy
IFQ_fcount                  6259261 # cumulative IFQ full count
ifq_occupancy                3.7139 # avg IFQ occupancy (insn's)
ifq_rate                     1.9121 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9423 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9102 # fraction of time (cycle's) IFQ was full
RUU_count                 105432385 # cumulative RUU occupancy
RUU_fcount                  5695383 # cumulative RUU full count
ruu_occupancy               15.3320 # avg RUU occupancy (insn's)
ruu_rate                     1.9121 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0183 # avg RUU occupant latency (cycle's)
ruu_full                     0.8282 # fraction of time (cycle's) RUU was full
LSQ_count                  32227149 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6865 # avg LSQ occupancy (insn's)
lsq_rate                     1.9121 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4509 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154702926 # total number of slip cycles
avg_sim_slip                11.8030 # the average slip between issue and retirement
bpred_bimod.lookups         1010971 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000565 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189482 # total number of accesses
il1.hits                   13189304 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013192 # total number of accesses
dl1.hits                    4007465 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       6355 # total number of hits
ul2.misses                       42 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189482 # total number of accesses
itlb.hits                  13189476 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357423 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 13:11:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264940 # total number of instructions executed
sim_total_refs              4823982 # total number of loads and stores executed
sim_total_loads             2865507 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197077 # total number of branches executed
sim_cycle                   6442056 # total simulation time in cycles
sim_IPC                      1.7978 # instructions per cycle
sim_CPI                      0.5562 # cycles per instruction
sim_exec_BW                  1.9039 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18986070 # cumulative IFQ occupancy
IFQ_fcount                  3918174 # cumulative IFQ full count
ifq_occupancy                2.9472 # avg IFQ occupancy (insn's)
ifq_rate                     1.9039 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5480 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6082 # fraction of time (cycle's) IFQ was full
RUU_count                  78186505 # cumulative RUU occupancy
RUU_fcount                  3314173 # cumulative RUU full count
ruu_occupancy               12.1369 # avg RUU occupancy (insn's)
ruu_rate                     1.9039 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3748 # avg RUU occupant latency (cycle's)
ruu_full                     0.5145 # fraction of time (cycle's) RUU was full
LSQ_count                  32400419 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0295 # avg LSQ occupancy (insn's)
lsq_rate                     1.9039 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6417 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124652644 # total number of slip cycles
avg_sim_slip                10.7631 # the average slip between issue and retirement
bpred_bimod.lookups         3257788 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442071 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435445 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820457 # total number of accesses
il1.hits                   12820240 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      18428 # total number of hits
ul2.misses                       73 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0039 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820457 # total number of accesses
itlb.hits                  12820450 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918906 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 13:11:14 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13395804 # total number of instructions executed
sim_total_refs              6748334 # total number of loads and stores executed
sim_total_loads             3824309 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390632 # total number of branches executed
sim_cycle                  10639435 # total simulation time in cycles
sim_IPC                      1.2520 # instructions per cycle
sim_CPI                      0.7987 # cycles per instruction
sim_exec_BW                  1.2591 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  40936805 # cumulative IFQ occupancy
IFQ_fcount                 10083471 # cumulative IFQ full count
ifq_occupancy                3.8476 # avg IFQ occupancy (insn's)
ifq_rate                     1.2591 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.0559 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9477 # fraction of time (cycle's) IFQ was full
RUU_count                 164875588 # cumulative RUU occupancy
RUU_fcount                  9937429 # cumulative RUU full count
ruu_occupancy               15.4966 # avg RUU occupancy (insn's)
ruu_rate                     1.2591 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.3080 # avg RUU occupant latency (cycle's)
ruu_full                     0.9340 # fraction of time (cycle's) RUU was full
LSQ_count                  86284586 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1099 # avg LSQ occupancy (insn's)
lsq_rate                     1.2591 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4412 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  270680939 # total number of slip cycles
avg_sim_slip                20.3200 # the average slip between issue and retirement
bpred_bimod.lookups          390943 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90461 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89879 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400110 # total number of accesses
il1.hits                   13399383 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169527 # total number of accesses
dl1.hits                    5881803 # total number of hits
dl1.misses                   287724 # total number of misses
dl1.replacements             286700 # total number of replacements
dl1.writebacks               143442 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     431198 # total number of hits
ul2.misses                      695 # total number of misses
ul2.replacements                567 # total number of replacements
ul2.writebacks                  456 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0016 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0013 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0011 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400110 # total number of accesses
itlb.hits                  13400091 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736794 # total number of hits
dtlb.misses                    4204 # total number of misses
dtlb.replacements              4076 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766272 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 13:11:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450132 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939055 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12639144 # total simulation time in cycles
sim_IPC                      1.6915 # instructions per cycle
sim_CPI                      0.5912 # cycles per instruction
sim_exec_BW                  1.6971 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49326535 # cumulative IFQ occupancy
IFQ_fcount                 11711708 # cumulative IFQ full count
ifq_occupancy                3.9027 # avg IFQ occupancy (insn's)
ifq_rate                     1.6971 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2996 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9266 # fraction of time (cycle's) IFQ was full
RUU_count                 200415227 # cumulative RUU occupancy
RUU_fcount                 12488563 # cumulative RUU full count
ruu_occupancy               15.8567 # avg RUU occupancy (insn's)
ruu_rate                     1.6971 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3433 # avg RUU occupant latency (cycle's)
ruu_full                     0.9881 # fraction of time (cycle's) RUU was full
LSQ_count                  64253301 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0837 # avg LSQ occupancy (insn's)
lsq_rate                     1.6971 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9955 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292451747 # total number of slip cycles
avg_sim_slip                13.6792 # the average slip between issue and retirement
bpred_bimod.lookups         1654488 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483041 # total number of accesses
il1.hits                   21482862 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944195 # total number of accesses
dl1.hits                    4941064 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       4056 # total number of hits
ul2.misses                       45 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0110 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483041 # total number of accesses
itlb.hits                  21483035 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886420 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:4096:8:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 13:11:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:4096:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860691 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  38767773 # total simulation time in cycles
sim_IPC                      0.7186 # instructions per cycle
sim_CPI                      1.3915 # cycles per instruction
sim_exec_BW                  0.7187 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154820295 # cumulative IFQ occupancy
IFQ_fcount                 38704824 # cumulative IFQ full count
ifq_occupancy                3.9935 # avg IFQ occupancy (insn's)
ifq_rate                     0.7187 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5569 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9984 # fraction of time (cycle's) IFQ was full
RUU_count                 619281695 # cumulative RUU occupancy
RUU_fcount                 38704264 # cumulative RUU full count
ruu_occupancy               15.9741 # avg RUU occupancy (insn's)
ruu_rate                     0.7187 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.2278 # avg RUU occupant latency (cycle's)
ruu_full                     0.9984 # fraction of time (cycle's) RUU was full
LSQ_count                 185536057 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7858 # avg LSQ occupancy (insn's)
lsq_rate                     0.7187 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6594 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  841326702 # total number of slip cycles
avg_sim_slip                30.1987 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860774 # total number of accesses
il1.hits                   27860563 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5805632 # total number of hits
dl1.misses                  2847766 # total number of misses
dl1.replacements            2846742 # total number of replacements
dl1.writebacks               953640 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3291 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3290 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1102 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3801617 # total number of accesses
ul2.hits                    3800533 # total number of hits
ul2.misses                     1084 # total number of misses
ul2.replacements                956 # total number of replacements
ul2.writebacks                  318 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860774 # total number of accesses
itlb.hits                  27860768 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017678 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

