[SC'13] Uday Bondhugula

Compiling affine loop nests for distributed-memory parallel architectures

Uday Bondhugula on November 17, 2013
doi.org
obsidian에서 수정하기

Abstract

To the best of the knowledge, this is the first work reporting end-to-end fully automatic distributed-memory parallelization and code generation for input programs and transformation techniques as general as those the authors allow. We present new techniques for compilation of arbitrarily nested loops with affine dependences for distributed-memory parallel architectures. Our framework is implemented as a source-level transformer that uses the polyhedral model, and generates parallel code with communication expressed with the Message Passing Interface (MPI) library. Compared to all previous approaches, ours is a significant advance either (1) with respect to the generality of input code handled, or (2) efficiency of communication code, or both. We provide experimental results on a cluster of multicores demonstrating its effectiveness. In some cases, code we generate outperforms manually parallelized codes, and in another case is within 25% of it. To the best of our knowledge, this is the first work reporting end-to-end fully automatic distributed-memory parallelization and code generation for input programs and transformation techniques as general as those we allow.

Figure

figure 2 figure 2

figure 3 figure 3

figure 4 figure 4

figure 6 figure 6

figure 7 figure 7

Table

table 1 table 1

table 2 table 2