Computational RAM

Computational RAM (C-RAM) is semiconductor random access memory with processors incorporated into the design to build an inexpensive massively-parallel computer.

Bandwidth Graph
In a typical computer with 32MB of 16Mb DRAM chips and a 100MHz processor, there is 3000 times the bandwidth available inside the memory vs. at the CPU. If you can't bring the memory bandwidth to the processor, then bring the processors to the memory.

Computational RAM Page Highlights


Computational Ram: A Memory-SIMD Hybrid and its Application to DSP

Duncan G. Elliott, W. Martin Snelgrove, and Michael Stumm. Computational RAM: A Memory-SIMD Hybrid and its Application to DSP. In Custom Integrated Circuits Conference, pages 30.6.1--30.6.4, Boston, MA, May 1992.

Computational RAM (C-RAM) is conventional RAM with SIMD processors added to the sense amplifiers . These bit-serial, externally programmed processors add only a small amount of area to the chip and in a 32Mbyte memory have an aggregate performance of 13 billion 32 bit operations per second. The chips are extendible and completely software programmable. In this paper we describe (1) the C-RAM architecture, (2) a working 8Kbit prototype, (3) a full scale C-RAM designed in a 4Mbit DRAM process, and (4) C-RAM applications.

postscript paper less photos (181KB)
chip micrograph (584KB)
PE detail micrograph (136KB)

These files are also available via anonymous FTP from
ftp.eecg.toronto.edu in /pub/tech_reports/dunc/*

Boiler Plate:

The above paper is Copyright 1992 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.


A PetaOp/s is Currently Feasible by Computing in RAM

Duncan G. Elliott, W. Martin Snelgrove, Christian Cojocaru, and Michael Stumm. A PetaOp/s is Currently Feasible by Computing in RAM. In PetaFLOPS Frontier Workshop, Washington DC, February 1995.
Technology considerations dictate that a petaOPS computer implemented with currently available technology would do most of its computing with simple processors integrated into memory and thereby exploit the high internal memory bandwidth. Such a system is proposed.

From the proceedings of the PetaFLOPS Frontier Workshop held February 1995 in Washington DC during The Fifth Symposium On The Frontiers Of Massively Parallel Computation:

postscript paper (51KB)
HTML paper
Slides (65KB)

Other information on PetaFLOPS Enabling Technologies and Applications


Compiler

The prototype C++ compiler and simulator for C-RAM allow architectural investigation, simulated performance measurement, and, of course, generates code for our chips. The compiler implements data-parallel variables (cint, cuint, cboolean, cvar). You can view source code from the next section. The RTL simulator emulates all ALU, register, memory, and communications operations.


Performance

The performance of a workstation can be sped-up thousands of times by running highly parallel applications directly in the memory. In these preliminary results, the simulated performance of these applications and kernels run on 32MB of 150ns C-RAM are compared to the measured performance on a SUN SparcStation-5 70MHz workstation.

Simulated performance by application
program cram (ms) host (ms) speedup Source code
3x3 Convolution 16M 17.6067 112760.0 6404 Parallel, Sequential
FIR 128K 40b 0.0991 311.7 3144 Parallel, Sequential
FIR 4M 16b 1.0437 5144.4 4929 Parallel, Sequential
Vector Quantization 25.746 33780 1312 Parallel, Sequential
Masked Blt 0.0182 442.8 24310 Parallel, Sequential
LMS Matching 0.2003 250.9 1253 Parallel, Sequential
Data Mining 70.66 192450 2724 Parallel, Sequential
Fault Simulation 0.0894 2380.0 26626 Parallel, Sequential
Satisfiability 0.0232 959.0 41391 Parallel, Sequential
Memory Clear 0.0016 8.8 5493 Parallel, Sequential


Computational RAM: A Memory-SIMD Hybrid

A doctoral thesis in preparation

Computational RAM (C-RAM) is semiconductor random access memory with processors incorporated into the design to build an inexpensive massively-parallel computer. If an application contains sufficient parallelism, it will typically run orders of magnitude faster in C-RAM than the central processing unit. This work includes architecture, prototype chips, compiler and applications.

C-RAM integrates SIMD (Single Instruction stream, Multiple Data stream) processors into random access memory at the sense amplifiers (along one edge of a 2 dimensional array of memory cells). The novel combination of processors with memory (the memory retains its memory interface) allows C-RAM to be used as computer main memory, as a video frame buffer or for stand-alone signal processing. The use of high-density commodity dynamic memory makes C-RAM economical. The bit-serial, externally programmed processing elements (PEs) add only slightly to the cost of the chip (9-20%), yet a workstation with 32Mbytes of C-RAM would have an aggregate performance of 13 billion 32 bit operations per second. A working 64 processing element per chip C-RAM has been fabricated and the PE for a 2048PE, 4Mbit chip has been designed.

The performance of C-RAM for kernels and real applications was obtained by simulating their execution. For this purpose, a prototype compiler was written. Applications are drawn from the fields of signal and image processing, computer graphics, synthetic neural networks, CAD, data base and scientific computing.


Computing in Memory Bibliography

Recent efforts to put the processing in the memory. ``Smart memory'' or ``PIM'' if you like.

Future

Request a copy of future papers or doctoral thesis.

Professors accepting graduate students

Here's a list of professors accepting graduate students to perform thesis research on aspects of Computational RAM.
This work is sponsored by:
(industry) MOSAID Technologies Inc., IBM T. J. Watson Research Center, Accelerix
(government) NSERC, Micronet, ITRC

Keywords: smart memory, smart DRAM, intelligent memory, intelligent DRAM, processors in memory, processing in memory, computing in memory, pitch-matched logic in memory, application specific memory, application specific DRAM, massively parallel computer, massively parallel computing, massively parallel SIMD, MPP, IRAM, DSP, VLSI, logic enhanced memory, logic enhanced DRAM, merged DRAM-logic, MPP applications, graphics, digital signal processing, image processing, image compression, scientific computing, database

recent accesses
Duncan's home page