논문 아카이브

[CVPR'16] Tien-Ju Yang, et.al.

Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning

Abstract This work proposes an energy-aware pruning algorithm for CNNs that directly uses the energy consumption of a CNN to guide the pruning process, and shows that reducing the number of target...

Posted by Tien-Ju Yang, et.al. on November 16, 2016

[MICRO'16] Cambricon-X

Cambricon-X: An accelerator for sparse neural networks

Abstract Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-...

Posted by Shijin Zhang, et.al. on October 15, 2016

[MICRO'16] Fused-Layer

Fused-layer CNN accelerators

Abstract Deep convolutional neural networks (CNNs) are rapidly becoming the dominant approach to computer vision and a major component of many other pervasive machine learning tasks, such as speec...

Posted by Manoj Alwani, et.al. on October 15, 2016

[ISCA'16] Cambricon

Cambricon: An Instruction Set Architecture for Neural Networks

Abstract 신경망(NN)은 새로운 기계 학습 및 패턴 인식 응용 분야를 위한 모델 계열입니다. NN 기술은 전통적으로 다양한 작업 부하를 유연하게 지원하기 위해 과도한 하드웨어 자원을 투자하는 일반 목적 프로세서(CPU 및 GPGPU)에서 실행되며, 이는 에너지 효율적이지 않습니다. 결과적으로, 최근에는 에너지 효율성을 향상시키기 위해 신경망을...

Posted by Shaoli Liu, et.al. on June 18, 2016

[ISCA'16] Eyeriss

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Abstract Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the ...

Posted by Yu-hsin Chen, et.al. on June 1, 2016

[TPLS'16] Pluto+

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Abstract The Pluto+ framework, a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, is proposed and extended in a way that allo...

Posted by Uday Bondhugula, et.al. on May 2, 2016

[FPGA'16] Jiantao Qiu, et.al.

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

Abstract This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and pr...

Posted by Jiantao Qiu, et.al. on February 21, 2016

[ISCA'16] EIE

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Abstract State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded system...

Posted by Song Han, et.al. on February 4, 2016

[JSSC'16] Eyeriss

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Abstract Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and ...

Posted by Yu-hsin Chen, et.al. on February 1, 2016

[MICRO'16] Stripes

Stripes: Bit-serial deep neural network computing

Abstract Stripes(STR)는 비트 직렬 컴퓨팅 유닛과 DNN에 자연스럽게 존재하는 병렬 처리를 사용하여 정확도 손실 없이 성능과 에너지를 개선하고 정확도, 성능, 에너지 간의 즉각적인 절충을 가능하게 하는 새로운 수준의 적응성을 제공합니다. 딥 뉴럴 네트워크(DNN)[1], [2]의 수치 정밀도 요구 사항의 다양성에서 영감을 받아, 사용...

Posted by Patrick Judd, et.al. on January 1, 2016