논문 아카이브

[JETCAS'18] Eyeriss v2

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Abstract Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs, is presented, which introduces a highly flexible on-chip network that can adapt to the different a...

Posted by Yu-hsin Chen, et.al. on July 10, 2018

[PLDI'18] Spatial

Spatial: a language and compiler for application accelerators

coresp_author: Olukotun Abstract This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compi...

Posted by D. Koeplinger, et.al. on June 11, 2018

[ISCA'18] GANAX

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Abstract 생성적 적대 신경망(GAN)은 제한된 진본 데이터 세트로부터 합성 데이터를 생성하는 가장 최신의 딥 러닝 모델 중 하나입니다. 딥 러닝의 여러 분야(예: 의학, 로봇공학, 콘텐츠 합성)로의 확장이 대규모 라벨링된 데이터 세트를 필요로 하기 때문에 GAN은 매우 중요합니다. 이러한 데이터 세트는 일반적으로 이용할 수 없거나 수집 비용이...

Posted by A. Yazdanbakhsh, et.al. on May 10, 2018

[DPCC'18] MAESTRO

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO.

Abstract The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, and they directly impact the performance and energy e...

Posted by Hyoukjun Kwon, et.al. on May 4, 2018

[MICRO'18] MAESTRO

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach

Abstract The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, which directly impacts the performance and energy eff...

Posted by Hyoukjun Kwon, et.al. on May 4, 2018

[CGO'18] Tiramisu

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Abstract 이 백서에서는 멀티코어, GPU, 분산 시스템을 포함한 여러 플랫폼을 위한 고성능 코드를 생성하도록 설계된 다면체 프레임워크인 티라미수를 소개합니다. 티라미수는 이러한 시스템을 대상으로 할 때 발생하는 복잡성을 명시적으로 관리하기 위해 새로운 명령어가 포함된 스케줄링 언어를 도입했습니다. 이 프레임워크는 이미지 처리, 스텐실, 선형 ...

Posted by Riyadh Baghdadi, et.al. on April 27, 2018

[HPCA'18] OuterSPACE

OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator

Abstract Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator target...

Posted by S. Pal, et.al. on March 27, 2018

[ASPLOS'18] MAERI

MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects

Abstract Deep neural networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational com...

Posted by Hyoukjun Kwon, et.al. on March 19, 2018

[HCS'18] Bit-Tactical

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

Abstract 컨볼루션 신경망(CNN)으로 추론하는 동안 0인 가중치와 활성화를 타겟팅하는 대신 다양한 가치 흐름 속성 조합을 타겟팅하면 비효율적인 작업이 2배에서 8배 이상 노출될 수 있음을 보여 줍니다. 컨볼루션 신경망(CNN)으로 추론하는 동안 0인 가중치와 활성화를 타겟팅하는 대신 다양한 가치 흐름 속성 조합을 타겟팅하면 비효율적인 작업이 ...

Posted by A. Delmas, et.al. on March 9, 2018

[CC'18] Spatial Locality

Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling

Abstract An algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators is proposed and effective algorithms can be d...

Posted by O. Zinenko, et.al. on February 24, 2018