논문 아카이브

[CVPR'14] Christian Szegedy, et.al.

Going deeper with convolutions

Abstract A deep convolutional neural network architecture codenamed Inception is proposed that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual...

Posted by Christian Szegedy, et.al. on September 16, 2014

[cviu'14] Denis Fortun, et.al.

Aggregation of local parametric candidates with exemplar-based occlusion handling for optical flow

Abstract Semantic Scholar extracted view of "Aggregation of local parametric candidates with exemplar-based occlusion handling for optical flow" by Denis Fortun et al. Figure figure 1 figure ...

Posted by Denis Fortun, et.al. on June 4, 2014

[BMVC'14] Max Jaderberg, et.al.

Speeding up Convolutional Neural Networks with Low Rank Expansions

Abstract Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filt...

Posted by Max Jaderberg, et.al. on May 15, 2014

[NIPS'14] Emily L. Denton, et.al.

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Abstract Using large state-of-the-art models, this work demonstrates speedups of convolutional layers on both CPU and GPU by a factor of 2 x, while keeping the accuracy within 1% of the original m...

Posted by Emily L. Denton, et.al. on April 2, 2014

[ASPLOS'14] DianNao

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Abstract 머신러닝 작업은 임베디드 시스템부터 데이터 센터에 이르기까지 광범위한 영역과 다양한 시스템에서 널리 사용되고 있습니다. 동시에 소규모의 머신러닝 알고리즘(특히 컨볼루션 및 심층 신경망, 즉 CNN과 DNN)이 많은 애플리케이션에서 최첨단 기술로 입증되고 있습니다. 아키텍처가 코어와 가속기가 혼합된 이기종 멀티 코어로 진화함에 따라 머...

Posted by Tianshi Chen, et.al. on February 24, 2014

[SC'13] Uday Bondhugula

Compiling affine loop nests for distributed-memory parallel architectures

Abstract To the best of the knowledge, this is the first work reporting end-to-end fully automatic distributed-memory parallelization and code generation for input programs and transformation tech...

Posted by Uday Bondhugula on November 17, 2013

[ISCA'13] Triggered instructions

Triggered instructions: a control paradigm for spatially-programmed architectures

Abstract 이 백서에서는 공간 병렬성을 활용하기 위한 Processing Element(PE) 배열의 새로운 제어 패러다임인 triggered instruction을 소개합니다. 트리거 명령어는 프로그램 카운터를 완전히 없애고 명시적인 분기 명령어 없이도 프로그램이 상태 간에 간결하게 전환할 수 있게 해줍니다. 또한 PE 간 통신 트래픽에 효율...

Posted by A. Parashar, et.al. on June 23, 2013

[PLDI'13] Halide

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Abstract A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image ...

Posted by Jonathan Ragan-Kelley, et.al. on June 16, 2013

[DAC'13] PolyCGRA

Polyhedral model based mapping optimization of loop nests for CGRAs

Abstract The coarse-grained reconfigurable architecture (CGRA) is a promising platform that provides both high performance and high power-efficiency. The compute-intensive portions of an applicati...

Posted by Dajiang Liu, et.al. on May 29, 2013

[PPL'12] Polly

Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation

Abstract Polly is presented, an infrastructure for polyhedral optimizations on the compiler's internal, low-level, intermediate representation (IR) and an interface for connecting external optimiz...

Posted by T. Grosser, et.al. on December 27, 2012