Optimal organizations for pipelined memory hierarchies

Bilardi, Gianfranco; Ekanadham, K.; Pattnaik, P.

doi:10.1145/564870.564886

In a recent paper (SPAA'01), we have established that the Pipelined Hierarchical Random Access Machine (PH-RAM) is a powerful model of computation, where most of the memory latency can be hidden by concurrency of accesses. In the present work, we explore ... In a recent paper (SPAA'01), we have established that the Pipelined Hierarchical Random Access Machine (PH-RAM) is a powerful model of computation, where most of the memory latency can be hidden by concurrency of accesses. In the present work, we explore the physical feasibility of PH-RAMs.A pipelined hierarchical memory of size $S$ is characterized by two metrics: the access function α(&khgr;), denoting the time required by an access to location $x$, and the pipeline period $p(S)$, denoting the minimum time between subsequent accesses that can be sustained. Physical constraints on minimum device size and maximum signal speed imply that, for a memory laid out in $d$ dimensions, a(&khgr;)= &OHgr;(&khgr;1/d)$. We propose a novel memory organization scheme that can be specialized to yield optimal performance α(&khgr;)=O(&khgr;^1/d)$ and $p(S)=O(1)$, for any $d \geq 1$.Managing a large number of concurrent load and store instructions would pose a significant burden on a traditional RISC processor, requiring both a large register file and complex logic to properly synchronize instructions. We show how these obstacles can be circumvented by introducing the Scalable transPORT (SPORT) computer where a simple processor drives a version of our pipelined hierarchical memory capable of servicing memory-to-memory instructions. We show that SPORT provides a feasible, scalable implementation of the PH-RAM model