Abstract
A DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS) to reduce many redundant convolutions that are followed by max-pooling operations, and provides the added benefit of increased sparsity per low-precision group. With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by $\sim 2\times $ with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.
Figure
Fig. 1. - PC multiplication of the input feature by weight.
Fig. 2. - Conceptual operation of the PC scheme.
Fig. 3. - Overall latency of the PC scheme compared to the non-PC scheme.
figure 4
figure 6
figure 8
figure 9
figure 10
figure 11
figure 12
figure 13
figure 14
figure 15
figure 16
figure 17
figure 18
figure 19
figure 20
figure 21
Table
Table I- Statistics About the Percentage of the Max-Pooling Results Found in Each Precision Group
table II
table III
table IV
table V
table VI