[MICRO'19] ExTensor

ExTensor: An Accelerator for Sparse Tensor Algebra

Kartik Hegde, et.al. on October 12, 2019
doi.org
obsidian에서 수정하기

Abstract

Generalized tensor algebra is a prime candidate for acceleration via customized ASICs. Modern tensors feature a wide range of data sparsity, with the density of non-zero elements ranging from 10-6% to 50%. This paper proposes a novel approach to accelerate tensor kernels based on the principle of hierarchical elimination of computation in the presence of sparsity. This approach relies on rapidly finding intersections—situations where both operands of a multiplication are non-zero—enabling new data fetching mechanisms and avoiding memory latency overheads associated with sparse kernels implemented in software. We propose the ExTensor accelerator, which builds these novel ideas on handling sparsity into hardware to enable better bandwidth utilization and compute throughput. We evaluate ExTensor on several kernels relative to industry libraries (Intel MKL) and state-of-the-art tensor algebra compilers (TACO). When bandwidth normalized, we demonstrate an average speedup of 3.4×, 1.3×, 2.8×, 24.9×, and 2.7× on SpMSpM, SpMM, TTV, TTM, and SDDMM kernels respectively over a server class CPU.

Figure

figure 1 figure 1

figure 2 figure 2

figure 3 figure 3

figure 4 figure 4

figure 5 figure 5

figure 6 figure 6

figure 7 figure 7

figure 8 figure 8

figure 9 figure 9

figure 10 figure 10

figure 11 figure 11

figure 12 figure 12

figure 13 figure 13

figure 14 figure 14

figure 15 figure 15

figure 16 figure 16

figure 17 figure 17

figure 18 figure 18

figure 19 figure 19

figure 20 figure 20

figure 21 figure 21

Table

table 1 table 1

table 2 table 2