This paper describes a scheme to implement a shared address space of size m on an n-node mesh, with m polynomial in n, where each mesh node hosts a processor and a memory module. At the core of the simulation is a hierarchical memory organization scheme (HMOS), which governs the distribution of the shared variables, each replicated into multiple copies, among the memory modules, through a cascade of bipartite graphs. Based on the expansion properties of such graphs, we devise a protocol that accesses any n-tuple of shared variables in worst-case time $O(n^{1/2+\eta})$, for any constant $\eta > 0$, using $O(1/\eta^{1.59})$ copies per variable, or in worst-case time O(n1/2 log n), using O(log1.59n ) copies per variable. In both cases the access time is close to the natural $O(\sqrt{n})$ lower bound imposed by the network diameter. A key feature of the scheme is that it can be made fully constructive when m is not too large, thus providing in this case the first efficient, constructive, deterministic scheme in the literature for bounded-degree processor networks. For larger memory sizes, the scheme relies solely on a nonconstructive graph of weak expansion. Finally, the scheme can be efficiently ported to other architectures, as long as they exhibit certain structural properties. In the paper we discuss the porting to multidimensional meshes and to the pruned butterfly, an area-universal network which is a variant of the fat-tree.

Constructive, deterministic implementation of shared memory on meshes

PIETRACAPRINA, ANDREA ALBERTO;PUCCI, GEPPINO;
2000

Abstract

This paper describes a scheme to implement a shared address space of size m on an n-node mesh, with m polynomial in n, where each mesh node hosts a processor and a memory module. At the core of the simulation is a hierarchical memory organization scheme (HMOS), which governs the distribution of the shared variables, each replicated into multiple copies, among the memory modules, through a cascade of bipartite graphs. Based on the expansion properties of such graphs, we devise a protocol that accesses any n-tuple of shared variables in worst-case time $O(n^{1/2+\eta})$, for any constant $\eta > 0$, using $O(1/\eta^{1.59})$ copies per variable, or in worst-case time O(n1/2 log n), using O(log1.59n ) copies per variable. In both cases the access time is close to the natural $O(\sqrt{n})$ lower bound imposed by the network diameter. A key feature of the scheme is that it can be made fully constructive when m is not too large, thus providing in this case the first efficient, constructive, deterministic scheme in the literature for bounded-degree processor networks. For larger memory sizes, the scheme relies solely on a nonconstructive graph of weak expansion. Finally, the scheme can be efficiently ported to other architectures, as long as they exhibit certain structural properties. In the paper we discuss the porting to multidimensional meshes and to the pruned butterfly, an area-universal network which is a variant of the fat-tree.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/1359906
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact