Abstract
A general method for allocating precision to trained deep neural networks data based on a property relating errors in a network is described, achieving a 29% and 46% energy saving over the state-of-the-art search-based method for GoogleNet and VGG-19 respectively. Precision tuning post-training is often needed for efficient implementation of deep neural networks especially when the inference implementation platform is resource constrained. While previous works have proposed many ad hoc strategies for this task, this paper describes a general method for allocating precision to trained deep neural networks data based on a property relating errors in a network. We demonstrate that the precision results of previous works for hardware accelerator or understanding cross layer precision requirement is subsumed by the proposed general method. It has achieved a 29% and 46% energy saving over the state-of-the-art search-based method for GoogleNet and VGG-19 respectively. Proposed precision allocation method can be used to optimize for different criteria based on hardware design constraints, allocating precision at the granularity of layers for very deep networks such as Resnet-152, which hitherto was not achievable.
Figure
figure 1
figure 2
figure 3
figure 4
Table
table I
table II
table III