When NOT to Use Core Generators

Application Note by P.A. Marshall for the JPEG2000 group.

Despite what one might expect, using core generators to create components does not always yield optimal results. In some cases, the standard design can lead to longer routing delays. This can be due to the way they are laid out, or extra signals that are not needed.

For example, in the design of the DWT module, inferred block RAM was used for testbenching purposes. When synthesized with block RAM, a minimum clock period of 8.855ns was obtained (5.585ns logic and 3.270ns routing). This corresponds to a maximum clock frequency of 112.931MHz. When the design was changed to use the LogiCore block RAM, the minimum clock period jumped to 13.408ns, despite the fact that nothing else in the design was changed! This corresponds to a maximum clock frequency of 74.582MHz, almost 34% slower than when using the inferred block ram. The critical path was made up of 10.168ns logic, and 3.240ns routing.

One possible explaination is that the dual port RAM provided by LogiCore has seperate clocks for each port. Our design used only one global clock. Designing dual port RAM with one clock common to both ports requires simpler logic in order to perform the memory accesses. Given the performance difference found, it is surprising that LogiCore does not provide the option to use only one clock.

Since we required the ability to pre-load the memory, using inferred block RAM was not an option. However, in high performance applications where this is not a requirement, the inferred block ram would be preferred.

Download the behavioural RAM description. The VHDL is optimized for a dual-port memory with one read-only port and one write-only port. The ports have the same address bus and data bus width, and share a common clock. However, with a few modifications, this file could be expanded to implement a more complex block RAM.