A Stack Processor

An 8-bit Stack Processor

author: Steven Sutankayo designers: Rob Chapman and Steven Sutankayo

1.0 Introduction

Digital systems can sometimes be implemented more cheaply or easily with a processor based solution. This can allow hardware savings by implementing functions algorithmically rather than in pure hardware. Instead of using an external processor, a lightweight processor can be synthesized on the FPGA and the remainder of the chip made available for system peripherals. An on-board processor is thus suitable for "system on a chip" applications. This application note presents a general purpose 8-bit stack processor (SP) suitable for such applications.

A general overview and history is presented, along with the source files and example usage.

2.0 Overview

For an overview of stack-based processors, refer to http://www.cs.cmu.edu/~koopman/stack_computers/index.html .

The processor is an Altera FPGA implementation of an existing processor created by my lab partner Rob Chapman in previous courses. His papers "A Stack Processor: Synthesis" and "A Writable Computer" describe the creation and evolution of the processor. There is also some useful documentation on how we use the stack processor to implement a neuroprocessor network in our project specification document.

2.1 Architecture

The processor has an eight bit data bus, eight bit address bus, one data stack, and one return stack. It is a zero operand processor, which means that the instruction opcodes do not contain information pertaining to the source, destination, or value of its instruction arguments. As is prevalent in stack processors, there are no data or address registers. Addition and substraction is performed with a statistical addition circuit in order to conserve chip resources.

2.2 Programmability

Instruction sets are available for data stack manipulation, program branching and returning, conditional branching, addition, subtraction, pointer fetching, pointer loading, and single-instruction incrementing of counters and pointers. Text source code files can be assembled into a Memory Initialization File (.mif file) format which is used to intialize memory at compile time.

2.3 Synthesis Issues

A simple test system requires 500 LCs, approximately 90% of an Altera 10k10 series FPGA. Version 8.1 of the maxplus2 design package was used. Since the source file assembler outputs .mif files, program memory should be implemented using the LPM_RAM_DQ megafunction. This implements memory using Altera's Embedded Array Blocks (EABs). The use of EABs for memory also reduces the delay that would be associated with combinational lookup tables.

3.0 Design Files

Two sets of files are available: stackprocessor.tar and example.tar.

3.1 Stackprocessor.tar

Included with in stackprocessor.tar are all the source code files and project configuration files (.cnf files) required to compile the SP:

Also included is the Stack Processor Assembler (SPASM). SPASM compiles the user's source code file into a .mif file. It is a two-pass assembler, so program counter labels may be used by the programmer and they will be resolved by the assembler. If the "-w" option is used, SPASM also compiles instruction code constants into a VHDL package. This is only necessary if the designer wants to modify or add instructions to the instruction register. The PERL interpreter is required to run SPASM.

3.2 Example.tar

The example.tar package contains a sample design for memory, and a sample top-level design file. A summation program is included which can be compiled by SPASM and used to provide the memory initialization file for the test system.

4.0 Using the Processor in a Design

Using the stack processor to create a digital system involves designing memory, creating the top-level entity (i.e. your stack processor based system), creating the program source file, assembling it, initializing memory, and simulation. If new instructions are needed, the stack processor can be modified, but this requires detailed knowledge of the processor's internals.

4.1 Memory Specification

To function properly, the SP requires the system memory behave in a well-known way. Memory is synchonous. To access a particular memory location, the appropriate address is placed on the address lines. On the next rising clock edge, the address is latched in. The SP assumes that the data will be available on the next rising clock edge.

As mentioned, the LPM_RAM_DQ megafunction is used to implement program memory. Altera recommends that this memory be configured as synchronous. There are two clock inputs to the RAM: address_in and data_out. If both of these inputs are clocked on the rising edge, two clock cycles would be required to fetch a particular memory location. Since this would violate our memory specification, the data_out port should be falling-edge triggered. This can be done by inverting the system clock. This way, the memory specification is met because the contents of the memory address present at any rising clock edge are guaranteed to be available at the next rising clock edge. NOTE: When I attempted this using version 7.1 of maxplus2, this did not work (see note).

8 bits of addressing are available, which gives 256 separate memory locations. For a general-purpose system, 128 locations could be allocated to program RAM and the rest to ROMs or peripherals. Larger programs could be allocated more address bits to provide more locations.

Care must be taken when designing compex memory systems, because of the sychronous memory requirements. The method in the example memory design uses a synchronous state machine to multiplex the two different memory components. This method can be scaled to more complicated memory systems by cascading the memory-decoding state machines.

The memory system included with the sample design allocates the first 128 locations to an EAB-based RAM, which is implemented with the Altera lpm_ram_dq megafunction.

The upper half of the memory space (the last 128 locations) is left undecoded, except for one 8-bit register which is fully decoded at address FF.

4.2 Designing The Top-Level Entity

The top-level design merely instantiates the stack processor and memory components and provides the required inputs and outputs. The processor data and address lines must be connected to memory, and inputs/outputs for peripherals must be provided.

4.3 Creating Program File

This version of the stack processor application note does not include extensive documentation of the instructions available for programs, but a simple and well commented example program has been included.

A brief description of each instruction is given here:

Instruction Description Action

psh_mem_DS push the contents of the given memory location to the data stack DS_index <= DS_index - 1
DS(index)<= memory(PC+1)

pop_DS_TOP pop the data dtack and store its contents to TOP TOP<= DS(index)
DS_index<= DS_index + 1

sto_DS_memimm store the contents of the data stack to the given memory location memory(PC+1)<= DS(index)

sto_memimm_DS_and_sto_DS_TOP fetch the contents of the given memory address and store to the data stack, while storing the current data stack to TOP TOP<= DS(index)
DS(index)<= memory(memory(PC+1))

bra_mem branch to the given memory location PC <= memory(PC+1)

bnzero_imm branch to the given memory location if TOP is nonzero if TOP != 0
   PC<= memory(PC+1)

bzero_imm branch to the given memory location if TOP is zero if TOP = 0
   PC<= memory(PC+1)

add add TOP to the data stack, place the result in the data stack DS(index) <= DS + TOP

subtract subtract TOP from the data stack, place the result in the data stack DS(index) <= DS - TOP

sto_mem_DS store the contents of the given memory location to the data stack DS(index) <= memory(PC+1)

psh_memptr_DS push the contents of the location referenced by the given pointer to the data stack DS(index) <=
   memory(memory(memory(PC+1)))

sto_DS_TOP copy the data stack to TOP TOP <= DS

swap exchange the contents of the data stack and TOP TOP <= DS, DS <= TOP

sto_TOP_DS copy TOP to the data stack DS <= TOP

incr_ptr increment given pointer by 1 ptr = memory(memory(PC+1))
memory(ptr) <= memory(ptr) + 1

sto_DS_memptr store the data stack to the location referenced by the given pointer ptr = memory(memory(PC+1))
memory(ptr) <= DS

4.4 Running the assembler

The SPASM assembler complies a program file and outputs a Memory Initialization File (.mif) which is used by the Altera compiler and simulator to initialize the RAM. It is written in PERL and expects the location to be /usr/local/bin/perl. If PERL has been installed to a different location just edit the SPASM source code and change the first line.

To compile your source file, type cat program_file.src | spasm > program_file.mif.

4.5 Initializing memory, compilation, and simulation

If you wish to initialize the memory via the compiler, just recompile your memory file or your top-level file. If no compilation is necessary, you may choose the "Initialize Memory" menu item in the Simulator menu. A dialog box will appear where you can inspect and initialize the contents of memory.

You may then simulate. Be careful that the simulator does not over-write your memory initialization between simulation runs. If this happens, just re-initialize it manually, or compile your memory file with the new .mif file present.

Endnotes

Version Discrepancy for EAB -based RAM:

When I attempted to clock the data_out port of the LPM_RAM_DQ megafunction to an inverted clock in version 7.1 of the tools, it did not work. Its still took 2 clock cycles to fetch the memory location.

References

Altera Corporation. "Guide to EAB-based RAM megafunctions": http://www.altera.com/document/an/an052_01.pdf
Rob Chapman. "A Stack Processor: Synthesis", http://www.compusmart.ab.ca/rc/Papers/spsynthesis.pdf
Rob Chapman. "A Writable Computer", http://www.compusmart.ab.ca/rc/Papers/writablecomputer.pdf
Steven Sutankayo, Rob Chapman. "Implementing A Neuroprocesor Network: Product Specification", http://www.compusmart.ab.ca/rc/Papers/inndps.pdf

Instruction	Description	Action
psh_mem_DS	push the contents of the given memory location to the data stack	DS_index <= DS_index - 1 DS(index)<= memory(PC+1)
pop_DS_TOP	pop the data dtack and store its contents to TOP	TOP<= DS(index) DS_index<= DS_index + 1
sto_DS_memimm	store the contents of the data stack to the given memory location	memory(PC+1)<= DS(index)
sto_memimm_DS_and_sto_DS_TOP	fetch the contents of the given memory address and store to the data stack, while storing the current data stack to TOP	TOP<= DS(index) DS(index)<= memory(memory(PC+1))
bra_mem	branch to the given memory location	PC <= memory(PC+1)
bnzero_imm	branch to the given memory location if TOP is nonzero	if TOP != 0 PC<= memory(PC+1)
bzero_imm	branch to the given memory location if TOP is zero	if TOP = 0 PC<= memory(PC+1)
add	add TOP to the data stack, place the result in the data stack	DS(index) <= DS + TOP
subtract	subtract TOP from the data stack, place the result in the data stack	DS(index) <= DS - TOP
sto_mem_DS	store the contents of the given memory location to the data stack	DS(index) <= memory(PC+1)
psh_memptr_DS	push the contents of the location referenced by the given pointer to the data stack	DS(index) <= memory(memory(memory(PC+1)))
sto_DS_TOP	copy the data stack to TOP	TOP <= DS
swap	exchange the contents of the data stack and TOP	TOP <= DS, DS <= TOP
sto_TOP_DS	copy TOP to the data stack	DS <= TOP
incr_ptr	increment given pointer by 1	ptr = memory(memory(PC+1)) memory(ptr) <= memory(ptr) + 1
sto_DS_memptr	store the data stack to the location referenced by the given pointer	ptr = memory(memory(PC+1)) memory(ptr) <= DS