[JETCAS'20] SRNPU

SRNPU: An Energy-Efficient CNN-Based Super-Resolution Processor With Tile-Based Selective Super-Resolution in Mobile Devices

Juhyoung Lee, et.al. on August 5, 2020
doi.org
obsidian에서 수정하기

Abstract

The SRNPU is the first ASIC implementation of the CNN-based SR algorithm which supports real-time Full-HD up-scaling and achieves higher restoration performance and power efficiency than previous SR hardware implementations. In this article, we propose an energy-efficient convolutional neural network (CNN) based super-resolution (SR) processor, super-resolution neural processing unit (SRNPU), for mobile applications. Traditionally, it is hard to realize real-time CNN-based SR on resource-limited platforms like mobile devices due to its massive amount of computation workload and communication bandwidth with external memory. The SRNPU can support the tile-based selective super-resolution (TSSR) which dynamically selects the proper sized CNN in a tile-by-tile manner. The TSSR reduces the computational workload of CNN-based SR by 31.1 % while maintaining image restoration performance. Moreover, a proposed selective caching based convolutional layer fusion (SC<sup>2</sup>LF) can reduce 78.8 % of external memory bandwidth with 93.2 % smaller on-chip memory footprint compared with previous layer fusion methods, by only caching short reuse distance intermediate feature maps. Additionally, reconfigurable cyclic ring architecture in the SRNPU enables maintaining high PE utilization by amortizing the reloading process caused by SC<sup>2</sup>LF operation under various convolutional layer configurations. The SRNPU is fabricated in 65 nm CMOS technology and occupies <inline-formula> <tex-math notation="LaTeX">$4 \times 4$ </tex-math></inline-formula> mm<sup>2</sup> die area. The SRNPU has a peak power efficiency of 1.9 TOPS/W at 0.75 V, 50 MHz. The SRNPU achieves 31.8 fps <inline-formula> <tex-math notation="LaTeX">$\times 2$ </tex-math></inline-formula> scale Full-HD generation and 88.3 fps <inline-formula> <tex-math notation="LaTeX">$\times 4$ </tex-math></inline-formula> scale Full-HD generation with higher restoration performance and power efficiency than previous SR hardware implementations. To the best of our knowledge, the SRNPU is the first ASIC implementation of the CNN-based SR algorithm which supports real-time Full-HD up-scaling.

Figure

figure 1 figure 1

figure 2 figure 2

figure 3 figure 3

figure 4 figure 4

figure 5 figure 5

figure 6 figure 6

figure 7 figure 7

figure 8 figure 8

figure 9 figure 9

figure 10 figure 10

figure 11 figure 11

figure 12 figure 12

figure 13 figure 13

figure 14 figure 14

figure 15 figure 15

figure 16 figure 16

Table

table I table I

table II table II

table III table III

table IV table IV