M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis et al., Xiaoqiang Zheng, and Google Brain. TensorFlow: A System for Large-Scale Machine Learning, Proceedings of the USENIX Symposium on Operating Systems Design and Implementation -OSDI '16, 2016.

E. Ardizzone, A. Bruno, and G. Mazzola, Saliency Based Image Cropping, Proceedings of the International Conference on Image Analysis and Processing -ICIAP '13, 2013.

M. Alwani, H. Chen, M. Ferdman, and P. Milder, Fused-layer CNN accelerators, Proceedings of the Annual International Symposium on Microarchitecture -MICRO '16, 2016.

R. Andri, L. Cavigelli, D. Rossi, and L. Benini, YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights, Proceedings of the IEEE Computer Society Annual Symposium on VLSI -ISVLSI '16, 2016.

A. Alaghi and J. P. Hayes, Survey of Stochastic Computing, ACM Transactions on Embedded Computing Systems, 2013.

A. Alaghi, P. John, and . Hayes, Fast and Accurate Computation using Stochastic Circuits, Proceedings of the Design, Automation & Test in Europe Conference & Exhibition -DATE '14, 2014.

S. Anwar, K. Hwang, and W. Sung, Fixed point optimization of deep convolutional neural networks for object recognition, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing -ICASSP '15, 2015.

R. Atul, L. Jongeun, and C. Kiyoung, Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array, Proceedings of the Design, Automation & Test in Europe Conference & Exhibition -DATE '16, 2016.

A. Alaghi, The Logic of Random Pulses : Stochastic Computing, 2015.

. +-15]-arash, F. Ardakani, N. Leduc-primeau, T. Onizawa, W. J. Hanyu et al., VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.25, issue.10, 2015.

. Aoc-+-17]-utku, . Aydonat, O. Shane, D. Connell, A. C. Capalija et al., An OpenCL(TM) Deep Learning Accelerator on Arria, vol.10

, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, 2017.

M. Kamel-abdelouahab, F. Pelcat, J. Berry, and . Sérot, Accelerating CNN inference on FPGAs: A Survey, 2018.

M. Kamel-abdelouahab and . Pelcat, Cedric Bourrasset, and Francois Berry. Tactics to Directly Map CNN graphs on Embedded FPGAs, IEEE Embedded Systems Letters, 2017.

M. Birem and F. Berry, DreamCam: A modular FPGA-based smart camera architecture, Journal of Systems Architecture, vol.60, issue.6, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01625648

D. Bradley, H. C. Brown, and . Card, Stochastic neural computation I: Computational elements, IEEE Transactions on Computers, vol.50, issue.9, 2001.

K. Benkrid, Reconfigurable Computing in the Multi-Core Era, Proceedings of the International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies -HEART '10, 2010.

S. Bka-+-16]-jeremy-bottleson, J. Kim, P. Andrews, . Bindu, N. Deepak et al., ClCaffe: OpenCL accelerated caffe for convolutional neural networks, Proceedings of the IEEE International Parallel and Distributed Processing Symposium -IPDPS'16, 2016.

L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, Proceedings of International Conference on Computational Statistics -COMPSTAT'10, 2010.

C. Bourrasset, High level synthesis of dataflow programs for image processing on FPGA-based smart camera. Application to machine learning, 2016.

M. Blott, T. Preusser, N. Fraser, G. Gambardella, O. Kenneth et al., FINN-R: An End-to-End DeepLearning Framework for Fast Exploration of Quantized Neural Networks, ACM Transactions on Reconfigurable Technology and Systems, 2018.

C. Bourrasset, J. Serot, and F. Berry, FPGA-based smart camera mote for pervasive wireless network, Proceedings of the International Conference on Distributed Smart Cameras -ICDSC'13, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01183679

S. Cad-+-12]-tomasz, U. Czajkowski, D. Aydonat, J. Denisenko, M. Freeman et al., From OpenCL to high-performance hardware on FPGAS, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '16, 2012.

L. Cavigelli and L. Benini, Origami : A 803 GOp/s/W Convolutional Network Accelerator, IEEE Transactions on Circuits and Systems for Video Technology, vol.8215, 2016.

M. Courbariaux, Y. Bengio, and J. David, Training deep neural networks with low precision multiplications, 2014.

M. Courbariaux, Y. Bengio, and J. David, BinaryConnect: Training Deep Neural Networks with binary weights during propagations, Advances in Neural Information Processing Systems -NIPS'15, 2015.

Y. Chen, J. Emer, and V. Sze, Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks, Proceedings of the International Symposium on Computer Architecture -ISCA '16, 2016.

E. Chb-+-16]-sebastien-caux, F. Hendrickx, M. Berry, J. Pelcat, . Serot et al., Proceedings of the International Conference on Distributed Smart Cameras -ICDSC'16, 2016.

. Cisco and . Cisco, Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016.

A. Canziani, A. Paszke, and E. Culurciello, An Analysis of Deep Neural Network Models for Practical Applications, 2016.

K. Chellapilla, S. Puri, and P. Simard, High Performance Convolutional Neural Networks for Document Processing, Proceedings of the International Workshop on Frontiers in Handwriting Recognition -FHR'06. Suvisoft, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00112631

S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, A Dynamically Configurable Coprocessor for Convolutional Neural Networks, ACM SIGARCH Computer Architecture News, vol.38, issue.3, 2010.

J. Cong and B. Xiao, Minimizing computation in convolutional neural networks, Proceedings of the International Conference on Artificial Neural Networks -ICANN '14, 2014.

F. Dias, F. Berry, J. Serot, and F. Marmoiton, Hardware, Design and Implementation Issues on a Fpga-Based Smart Camera, Proceedings of the International Conference on Distributed Smart Cameras -ICDSC'07, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01626487

V. Florent-de-dinechin and . Lefevre, Constant multipliers for FPGAs. Parallel and Distributed Processing Techniques and Applications, 2000.

. Ddl-+-18]-li, Y. Du, Y. Du, J. Li, Y. C. Su et al., A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers, vol.65, 2018.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ShiDianNao: Shifting vision processing closer to the sensor, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '09. IEEE, 2009.

N. Dave, K. Fleming, M. King, M. Pellauer, and M. Vijayaraghavan, Hardware acceleration of matrix multiplication on a Xilinx FPGA, Proceedings of ACM and IEEE International Conference on Formal Methods and Models for Co-Design, MEMOCODE'07, 2007.

J. David, K. Kalach, and N. Tittley, Hardware Complexity of Modular Multiplication and Exponentiation, IEEE Transactions on Computers, vol.56, issue.10, 2007.

R. Dicecco, G. Lacey, J. Vasiljevic, P. Chow, G. Taylor et al., Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks, Proceedings of the International Conference on Field-Programmable Technology -FPT '16, 2016.

B. Jack, D. P. Dennis, and . Misunas, A Preliminary Architecture for a Basic Data-flow Processor, Proceedings of the International Symposium on Computer Architecture -ISCA '75, 1975.

S. Derrien and S. Rajopadhye, Loop tiling for reconfigurable accelerators, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '01, vol.2147, 2001.

R. Dorrance, F. Ren, and D. Markovi´cmarkovi´c, A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs, Proceedings of the ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays -FPGA '14, 2014.

J. G. Eldredge and B. L. Hutchings, RRANN: a hardware implementation of the backpropagation algorithm using reconfigurable FPGAs, Proceedings of the IEEE International Conference on Neural Networks -ICNN'94, vol.4, 1994.

M. Amir-erfan-eshratifar and . Pedram, Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment, Proceedings of the Great Lakes Symposium on VLSI -GLSVLSI'18, GLSVLSI '18, 2018.

M. Everingham, L. Van-gool, K. I. Christopher, J. Williams, A. Winn et al., The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.88, issue.2, 2010.

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.4, 2006.

. Fmc-+-11]-clement, B. Farabet, B. Martini, P. Corda, E. Akselrod et al., NeuFlow: A runtime reconfigurable dataflow processor for vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '11, 2011.

C. Farabet, J. Poulet, Y. Han, D. R. Lecun, S. Tobergte et al., CNP: An FPGA-based processor for Convolutional Networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '09, 2009.

T. Fujii, S. Sato, H. Nakahara, and M. Motomura, An FPGA Realization of a Deep Convolutional Neural Network Using a Threshold Neuron Pruning, Proceedings of the International Symposium on Applied Reconfigurable Computing -ARC'16, vol.9625, 2017.

P. Forsyth, R. Tang, and Z. Xu, An Empirical Study of Pruning and Quantization Methods for Neural Networks, 2017.

J. Nicholas, Y. Fraser, G. Umuroglu, M. Gambardella, P. Blott et al., Scaling Binarized Neural Networks on Reconfigurable Logic, Proceedings of the Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms -PARMA-DITAM'17, 2017.

. Gan-+-15]-suyog, A. Gupta, P. Agrawal, K. Narayanan, P. Gopalakrishnan et al., Deep Learning with Limited Numerical Precision, Proceedings of the International Conference on Machine Learning -ICML '15, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '14, 2014.

U. Gregor, J. Krzysztof, E. Geras, O. Kahou-samira, W. Aslan et al., Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, and Caruana Rich. Do Deep Convolutional Neural Networks need to be deep and convolutional ?, Proceedings of the International Conference on Learning Representations -ICLR'17, 2017.

R. Girshick, J. Fast-r-cnn-;-vinayak-gokhale, A. Jin, B. Dundar, E. Martini et al., A 240 G-ops/s mobile coprocessor for deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '14, 2014.

P. K. Gupta and R. Kumaresan, Binary Multiplication with PN Sequences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.36, issue.4, 1988.

P. Gysel, M. Motamedi, and S. Ghiasi, Hardwareoriented Approximation of Convolutional Neural Networks, 2016.

P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi, Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks, IEEE Transactions on Neural Networks and Learning Systems, 2018.

G. Guidi, E. Reggiani, L. D. Tucci, G. Durelli, M. Blott et al., On How to Improve FPGA-Based Systems Design Productivity via SDAccel, Proceedings of the IEEE International Parallel and Distributed Processing Symposium -IPDPS'16, 2016.

T. Guo, Towards Efficient Deep Inference for Mobile Applications, 2017.

L. Gwc-+-17]-shasha-guo, B. Wang, Q. Chen, Y. Dou, Z. Tang et al., FixCaffe: Training CNN with Low Precision Arithmetic Operations by Fixed Point Caffe, Proceedings of the International Workshop on Advanced Parallel Processing Technologies -APPT '17. Springe, 2017.

J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy et al., Recent Advances in Convolutional Neural Networks. Pattern Recognition, 2017.

P. Gysel-;-itay-hubara, M. Courbariaux, and D. Soudry, Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks, Advances in Neural Information Processing Systems -NIPS'16, 2016.

I. Hubara, M. Courbariaux, D. Soudry, R. El-yaniv, and Y. Bengio, Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations, Journal of Machine Learning Research, 2018.

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram et al., EIE: Efficient Inference Engine on Compressed Deep Neural Network, Proceedings of the International Symposium on Computer Architecture -ISCA '16, vol.16, 2016.

S. Han, H. Mao, and W. J. Dally, Deep Compression -Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, Proceedings of the International Conference on Learning Representations -ICLR'16, 2016.

M. Horowitz, Computing's energy problem (and what we can do about it). In Proceedings of the IEEE International Solid-State Circuits -ISSCC '14, 2014.

S. Hengstler, D. Prashanth, S. Fong, and H. Aghajan,

. Mesheye, Proceedings of the International conference on Information processing in sensor networks -IPSN '07, 2007.

S. Han, J. Pool, J. Tran, and W. J. Dally, Learning both Weights and Connections for Efficient Neural Network, Advances in Neural Information Processing Systems -NIPS'15, 2015.

T. Highlander and A. Rodriguez, Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-andAdd, 2016.

K. He and J. Sun, Convolutional Neural Networks at Constrained Time Cost, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, 2015.

G. Hinton, N. Srivastava, and K. Swersky, A separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning, 2012.

H. David, . Hubel, N. Torsten, and . Wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, vol.160, issue.1, 1962.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '16, 2016.

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally et al., AlexNet accuracy with 50x fewer parameters and 0.5MB Model Size. arXiv e-print, 2016.

F. Intel, Implementing Multipliers in FPGA Devices, 2004.

F. Intel, Cyclone V Device Handbook, vol.1, 2014.

F. Intel, Floating-Point IP Cores User Guide, 2014.

F. Intel, The Intel FPGA SDK for Open Computing Language (OpenCL), 2016.

F. Intel, Intel Stratix 10 Variable Precision DSP Blocks User Guide, Intel FPGA, 2017.

F. Intel, Cyclone V Device Overview, 2018.

F. Intel, Intel Stratix 10 Product Table, 2018.

S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the International Conference on Machine Learning -ICML '15, vol.37, 2015.

J. Jang, S. Choi, K. Viktor, and . Prasanna, Area and Time E ffi cient Implementations of Matrix Multiplication on FPGAs, Proceedings of the International Conference on Field-Programmable Technology -FPT'02, 2002.

S. B. Ju-wook-jang, V. K. Choi, and . Prasanna, Energy-and timeefficient matrix multiplication on FPGAs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2005.

H. Jiang, J. Han, F. Lombardi-;-yangqing-jia, E. Shelhamer, J. Donahue et al., Caffe: Convolutional Architecture for Fast Feature Embedding, Proceedings of the ACM International Conference on Multimedia -MM'14, 2014.

P. Diederik, J. Kingma, and . Ba, Adam: A Method for Stochastic Optimization, Proceedings of the International Conference on Learning Representations -ICLR'15, 2014.

I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, and S. Abu-elhaija, Openimages: A public dataset for large-scale multi-label and multiclass image classification, 2016.

S. Kestur, J. D. Davis, and E. S. Chung, Towards a universal FPGA matrix-vector multiplication architecture, Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines -FCCM'12, 2012.

. Khronos and . Opencl, The open standard for parallel programming of heterogeneous systems, 2015.

. +-16]-kyounghoon, J. Kim, J. Kim, J. Yu, J. Seo et al., Dynamic Energy-accuracy Trade-off Using Stochastic Computing in Deep Neural Networks, Proceedings of the Annual Conference on Design Automation -DAC '16, 2016.

B. Jong-hwan-ko, T. Ahmad-mudassar, S. Na, and . Mukhopadhyay, Design of an Energy-Efficient Accelerator for Training of Convolutional Neural Networks using Frequency-Domain Computation, Proceedings of the Annual Conference on Design Automation -DAC '17, 2017.

M. Konrad, Run-time Recongurable Constant Multiplication on Field Programmable Gate Arrays, 2017.

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.

A. Krizhevsky, I. Sutskever, H. Geoffrey, E. , and G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems -NIPS'12, 2012.
DOI : 10.1145/3065386

URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

M. Kümmerer, L. Theis, and M. Bethge, Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet, Proceedings of the International Conference on Learning Representations -ICLR'15, 2015.

Y. Lecun, Y. Bottou, P. Bengio, B. E. Haffner-;-yann-lecun, J. S. Boser et al., Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems -NIPS'90, 1990.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, 2015.

M. Lin, Q. Chen, and S. Yan, Network In Network. arXiv preprint, 2013.

Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li et al., Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks, ACM Transactions on Reconfigurable Technology and Systems, vol.10, issue.3, 2017.
DOI : 10.1145/3079758

H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou et al., A high performance FPGA-based accelerator for large-scale convolutional neural networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '16, 2016.

A. Lavin and S. Gray, Fast Algorithms for Convolutional Neural Networks. arXiv e-print, 2015.
DOI : 10.1109/cvpr.2016.435

URL : http://arxiv.org/pdf/1509.09308

P. Li and D. J. Lilja, Using stochastic computing to implement digital image processing algorithms, Proceedings of the IEEE International Conference on Computer Design -ICCD '11, 2011.
DOI : 10.1109/iccd.2011.6081391

L. Lu, Y. Liang, Q. Xiao, and S. Yan, Evaluating fast algorithms for convolutional neural networks on FPGAs, Proceedings of the IEEE Annual International Symposium on Field-Programmable Custom Computing Machines -FCCM '17, 2017.

A. Edward, . Lee, G. David, and . Messerschmitt, Synchronous data flow, Proceedings of the IEEE, 1987.

T. Lundin and P. Moerland, Quantization and Pruning of Multilayer Perceptrons: Towards Compact Neural Networks, 1997.

M. Lmb-+-14-;-tsung-yi-lin, S. Maire, J. Belongie, P. Hays, D. Perona et al., Microsoft COCO: Common Objects in Context, Proceedings of the European Conference on Computer Vision -ECCV'14, 2014.

M. Leeser, S. Miller, and H. Yu, Smart Camera Based on Reconfigurable Hardware Enables Diverse Real-Time Applications, Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines -FCCM'04, 2004.
DOI : 10.1109/fccm.2004.53

URL : http://www.cse.buffalo.edu/courses/cse725/peter/Leeser_2004.pdf

J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, 2015.

D. Lin, S. Talathi, V. Liu, M. Wang, H. Foroosh et al., Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, 2015.

. H-r-mahdiani, . Ahmadi, C. S-m-fakhraie, and . Lucas, Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of SoftComputing Applications, IEEE Transactions on Circuits and Systems I: Regular Papers, vol.57, issue.4, 2010.

L. Maggiani, Heterogeneous Smart Cameras: towards the Internet of Reconfigurable Things, 2017.

C. Mbp-+-15]-luca-maggiani, M. Bourrasset, F. Petracca, P. Berry, C. Pagano et al., HOG-Dot: A Parallel Kernel-Based Gradient Extraction for Embedded Image Processing, IEEE Signal Processing Letters, 2015.

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017.

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, 2017.

G. Mdc-+-16]-paolo-meloni, F. Deriu, I. Conti, L. Loi, L. Raffo et al., Curbing the Roofline : a Scalable and Flexible Architecture for CNNs on FPGA, Proceedings of the ACM International Conference on Computing Frontiers -CF '16, 2016.

M. Motamedi and P. Gysel, Venkatesh Akella, and Soheil Ghiasi. Design space exploration of FPGA-based Deep Convolutional Neural Networks, Proceedings of the Asia and South Pacific Design Automation Conference -ASPDAC'16, 2016.

M. Motamedi, P. Gysel, and S. Ghiasi, PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs, ACM Transactions on Multimedia Computing, vol.13, issue.4, 2017.

. Microsoft, Microsoft unveils Project Brainwave for real-time AI, 2017.

S. Mittal, A Survey of Techniques for Approximate Computing, ACM Computing Surveys, vol.48, issue.4, 2016.

Y. Ma, M. Kim, Y. Cao, S. Vrudhula, and J. Seo, End-toend scalable FPGA accelerator for deep residual networks, Proceedings of the IEEE International Symposium on Circuits and Systems -ISCAS '17, 2017.

B. Moons and V. Marian, A 0.3-2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets, IEEE Symposium on VLSI Circuits, 2016.

N. Msc-+-16]-yufei-ma, Y. Suda, J. S. Cao, S. Seo, and . Vrudhula, Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA, 2016.

. Mtk-+-17]-pavlo, S. Molchanov, T. Tyree, T. Karras, J. Aila et al., Pruning Convolutional Neural Networks for Resource Efficient Learning, 2017.

H. Nakahara, T. Fujii, and S. Sato, A fully connected layer elimination for a binarizec convolutional neural network on an FPGA, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017.

E. Nurvitadhi, S. Subhaschandra, G. Boudoukh, G. Venkatesh, J. Sim et al., Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?, Proceedings of the ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays -FPGA '17, 2017.

. Nvidia, GPU-Based Deep Learning Inference: A Performance and Power Analysis. White Paper, 2015.

. Nvidia, Nvidia Tesla P100 GPU Architecture. White Paper, 2016.

. Nvidia, Nvidia Tesla V100 GPU Architecture, p.2017

Y. Netzer and T. Wang, Reading digits in natural images with unsupervised feature learning, Advances in Neural Information Processing Systems -NIPS'11, 2011.

H. Nakahara, H. Yonekawa, T. Fujii, and S. Sato, A Lightweight YOLOv2, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '18, 2018.

F. Ortega, J. M. Jerez, D. Urdamunoz, R. Luquebaena, and L. Franco, Efficient Implementation of the Backpropagation Algorithm in FPGAs and Microcontrollers, IEEE Transactions on Neural Networks and Learning Systems, vol.27, issue.9, 2016.

K. Ovtcharov, O. Ruwase, J. Kim, J. Fowers, K. Strauss et al., Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. White paper, 2015.

M. Pelcat, C. Bourrasset, L. Maggiani, and F. Berry, Design productivity of a high level synthesis compiler versus HDL, Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation -SAMOS'16, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01358210

A. Prostboucle, A. Bourge, F. Pétrot, H. Alemdar, N. Caldwell et al., Scalable High-Performance Architecture for Convolutional Ternary Neural Networks on FPGA, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01563763

H. Perkins, Deep CL: OpenCL library to train deep convolutional neural networks, 2017.

Y. Pan and P. Kumar-meher, Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation, IEEE Transactions on Circuits and Systems I: Regular Papers, vol.61, issue.2, 2014.

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li et al., Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '16, 2016.

M. Rusci, L. Cavigelli, and L. Benini, Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity? arXiv preprint, 2017.

H. Xukan-ran, Z. Chen, J. Liu, and . Chen, Delivering Deep Learning to Mobile Devices via Offloading, Proceedings of the Workshop on Virtual Reality and Augmented Reality Network, VR/AR Network '17, 2017.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '16, 2016.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.115, issue.3, 2014.

J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement, 2018.

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Y. +-17]-samyam-rajbhandari, O. He, M. Ruwase, T. Carbin, S. Chilimbi et al., Optimizing CNNs on Multicores for Scalability, Performance and Goodput, Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems -ASPLOS'17, vol.51, 2017.

A. Ren, J. Li, Z. Li, C. Ding, X. Qian et al., Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing. Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems -ASPLOS'17, 2017.

P. +-16]-brandon-reagen, R. Whatmough, S. Adolf, H. Rama, S. Lee et al., Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators, Proceedings of the International Symposium on Computer Architecture -ISCA '16, 2016.

A. Mohammed, M. Salem, F. Appel, B. Winkler, and . Meffert, FPGA-based Smart Camera for 3D wavelet-based image segmentation, Proceedings of the International Conference on Distributed Smart Cameras -ICDSC'08, 2008.

J. Serot, F. Berry, and C. Bourrasset, High-level dataflow programming for real-time image processing on smart cameras, Journal of Real-Time Image Processing, vol.12, issue.4, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01626464

N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma et al., Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '16, 2016.

V. Sze, Y. Chen, T. Yang, and J. Emer, Efficient Processing of Deep Neural Networks: A Tutorial and Survey, Proceedings of the IEEE, vol.105, issue.12, 2017.

G. Richard and . Shoup, Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. A Massively Parallel Coprocessor for Convolutional Neural Networks, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing -ICASSP '17, 1994.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, 2015.

H. Su and S. Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition, Proceedings of the IEEE International Conference on Computer Vision -ICCV '15, 2015.

H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim et al., From high-level deep neural models to FPGAs, Proceedings of the International Symposium on Microarchitecture -MICRO '16, 2016.

+. Shen, Y. Qiao, Y. Huang, M. Wen, and C. Zhang, Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs, Proceedings of the International Symposium on Circuits and Systems -ISCAS'18, 2018.

A. Sironi, B. Tekin, R. Rigamonti, V. Lepetit, and P. Fua, Learning separable filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.1, 2015.

I. Sutherland, Online graphical specification of procedures, 1966.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

M. Taylor, CSE 548: Computer Architecture. Dataflow Computers, 2006.

L. Theis, I. Korshunova, A. Tejani, and F. Huszár, Faster gaze prediction with dense networks and Fisher pruning, 2018.

Y. Umuroglu, J. Nicholas, G. Fraser, M. Gambardella, P. Blott et al., FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, 2017.

S. Venieris and C. Bouganis, FpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs, Proceedings of the IEEE Annual International Symposium on Field-Programmable Custom Computing Machines -FCCM '16, 2016.

S. Venieris and C. Bouganis, Latency-Driven Design for FPGA-based Convolutional Neural Networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017.

I. Stylianos and . Venieris, Alexandros Kouris, and Christos-Savvas Bouganis. Toolflows for Mapping Convolutional Neural Networks on FPGAs, vol.51, 2018.

. J-von-neumann, Probabilistic logics and the synthesis of reliable organisms from unreliable components, Automata Studies, 1956.

Y. Voronenko and M. Püschel, Multiplierless multiple constant multiplication, ACM Transactions on Algorithms, vol.3, issue.2, 2007.

E. Walters, Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs, Electronics, vol.6, issue.4, 2017.

D. Wang, PipeCNN: An OpenCL-based FPGA Accelerator for Convolutinal Neural Networks, Proceedings of the International Conference on Field-Programmable Technology -FPT '17, 2017.

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, vol.13, issue.4, 2004.

D. Williamson, Dynamically scaled fixed point arithmetic, Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference, 1991.

S. Winograd, Arithmetic complexity of computations, vol.33, 1980.

+. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, Quantized Convolutional Neural Networks for Mobile Devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '16, 2016.

J. Woodhouse, Big, big, big data: higher and higher resolution video surveillance, 2014.

L. Wang, W. Ouyang, X. Wang, and H. Lu, Visual Tracking with Fully Convolutional Networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR'15, 2015.

S. Williams, A. Waterman, and D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, vol.52, issue.4, 2009.

Y. Wang, M. Zhang, and J. Yang, Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture, Proceedings of the Annual Conference on Design Automation -DAC '17, 2017.

. Xilinx, Introduction to FPGA Design with Vivado High-Level Synthesis, vol.998, 2013.

T. Yang, Y. Chen, and V. Sze, Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '17, 2017.

J. Yang, C. Everett, L. Buehler, and . Mcmillan, A real-time distributed light field camera. The Eurographics Association, 2002.

C. Zhang, Z. Fang, P. Zhou, P. Pan, and J. Cong, Caffeine: Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks, Proceedings of the International Conference on Computer-Aided Design -ICCAD '16, 2016.

C. Zhu, S. Han, H. Mao, and W. J. Dally, Trained Ternary Quantization, Proceedings of the International Conference on Learning Representations -ICLR'17, 2017.

J. Zhang and J. Li, Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, 2017.

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao et al., Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '15, FPGA, 2015.

R. Zhao, W. Ouyang, H. Li, and X. Wang, Saliency detection by multi-context deep learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, 2015.

L. Zhuo, K. Viktor, and . Prasanna, Sparse Matrix-Vector multiplication on FPGAs, Proceedings of the ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays -FPGA '05, 2005.

C. Zhang and V. Prasanna, Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, 2017.

R. Zhao, W. Song, W. Zhang, T. Xing, J. Lin et al., Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, Proceedings of the ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays -FPGA '17, 2017.

. +-16]-shuchang, Y. Zhou, Z. Wu, X. Ni, H. Zhou et al., DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, 2016.

. Zws-+-16]-chen, D. Zhang, J. Wu, G. Sun, G. Sun et al., Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster, Proceedings of the International Symposium on Low Power Electronics and Design -ISLPED '16, 2016.

. +-17]-shuchang, Y. Zhou, H. Wang, Q. Wen, Y. He et al., Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks, Publications Journals ? K. Abdelouahab, M. Pelcat, J. Sérot and F. Berry (2017) «Tactics to Directly Map CNN graphs on Embedded FPGAs», vol.32, 2017.

?. J. Bonnard, K. Abdelouahab, M. Pelcat, and F. Berry, Real-time Embedded Object Classification with FPGA-based Distributed Multi-View CNNs, Submitted to the Design Automation Conference -DAC'19, 2018.

?. K. Abdelouahab, M. Pelcat, and F. Berry, «The Challenge of Multi-Operand Adders in CNNs on FPGAs, And How NOT to Solve It!, Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation -SAMOS'18, 2018.

?. K. Abdelouahab, M. Pelcat, and F. Berry, PhD Forum: Why TanH can be a Hardware Friendly Activation Function for CNNs», Proceedings of the 11th International Conference on Distributed Smart Cameras -ICDSC'17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01654697

?. K. Abdelouahab, C. Bourrasset, M. Pelcat, J. Sérot, J. C. Quinton et al., «A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA», Proceedings of the 10th International Conference on Distributed Smart Cameras -ICDSC'16, 2016.

?. K. Book-chapters, M. Abdelouahab, F. Pelcat, and . Berry, «Accelerating CNN inference on FPGAs: A Survey», Deep Learning in Computer Vision: Theories and Applications, 2018.