Abstract
A general and robust loop transformation framework that enables compilers to generate efficient code on complex loop nests and shows performance results on automaticallygenerated code for the Pentium M and MIPS R10000 that are comparable to the best hand-tuned codes, and significantly better than the native compilers. This paper describes a general and robust loop transformation framework that enables compilers to generate efficient code on complex loop nests. Despite two decades of prior research on loop optimization, performance of compiler-generated code often falls short of manually optimized versions, even for some well-studied BLAS kernels. There are two primary reasons for this. First, today’s compilers employ fixed transformation strategies, making it difficult to adapt to different optimization requirements for different application codes. Second, code transformations are treated in isolation, not taking into account the interactions between different transformations. This paper addresses such limitations in a unified framework that supports a broad collection of transformations, (permutation, tiling, unroll-and-jam, data copying, iteration space splitting, fusion, distribution and others), which go beyond existing polyhedral transformation models. This framework is a key element of a compiler we are developing which performs empirical optimization to evaluate a collection of alternative optimized variants of a code segment. A script interface to code generation and empirical search permits transformation parameters to be adjusted independently and tested; alternative scripts are used to represent different code variants. By applying this framework to example codes, we show performance results on automaticallygenerated code for the Pentium M and MIPS R10000 that are comparable to the best hand-tuned codes, and significantly better (up to a 14x speedup) than the native compilers.
Figure
figure 1
figure 2
figure 3
figure 4
figure 5
figure 7
figure 8
figure 11
figure 12
figure 14
figure 15
figure 16
Table
table 1