, 3) -original "basic" block

, B(1, 3, 1) -with the same dimensionality of all convolutions

, B(3, 1, 1) -Network-in-Network style block

B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image windows, 2012.

P. Arbeláez, J. Pont-tuset, J. Barron, F. Marques, M. et al., Multiscale combinatorial grouping, CVPR, 2014.

J. Ba, R. Kiros, and G. E. Hinton, Layer normalization. CoRR, 2016.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

V. Balntas, E. Johns, L. Tang, and K. Mikolajczyk, Pn-net: Conjoined triple deep network for learning local image descriptors, 2016.

H. Bay, T. Tuytelaars, and L. V. Gool, Surf: Speeded up robust features, ECCV, pp.404-417, 2006.

S. Becker and Y. Lecun, Improving the convergence of back-propagation learning with secondorder methods, Proc. of the 1988 Connectionist Models Summer School, pp.29-37, 1989.

S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural nets, 2016.

Y. Bengio and X. Glorot, Understanding the difficulty of training deep feedforward neural networks, Proceedings of AISTATS 2010, vol.9, pp.249-256, 2010.

Y. Bengio and Y. Lecun, Scaling learning algorithms towards AI, Large Scale Kernel Machines, 2007.

L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, T. et al., Fully-convolutional siamese networks for object tracking, European Conference on Computer Vision, pp.850-865, 2016.

. Springer,

M. Bianchini and F. Scarselli, On the complexity of shallow and deep neural network classifiers, 22th European Symposium on Artificial Neural Networks, 2014.

X. Boix, M. Gygli, G. Roig, and L. Van-gool, Sparse quantization for patch description, CVPR, 2013.

J. Bromley, I. Guyon, Y. Lecun, E. Säckinger, and R. Shah, Signature verification using a siamese time delay neural network, NIPS, 1993.

M. Brown, G. Hua, and S. Winder, Discriminative learning of local image descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.33, issue.1, pp.43-57, 2011.

A. E. Bryson, A gradient method for optimizing multi-stage allocation processes, Proc, 1961.

H. Univ, Symposium on digital computers and their applications

C. Bucila, R. Caruana, and A. Niculescu-mizil, Model compression, KDD, pp.535-541, 2006.

M. Calonder, V. Lepetit, and P. Fua, Brief: Binary robust independent elementary features, 2010.

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the devil in the details: Delving deep into convolutional nets, British Machine Vision Conference, 2014.

T. Chen, I. Goodfellow, and J. Shlens, Net2net: Accelerating learning via knowledge transfer, International Conference on Learning Representation, 2016.

S. Chopra, R. Hadsell, and Y. Lecun, Learning a similarity metric discriminatively, with application to face verification, CVPR, 2005.

C. B. Choy, J. Gwak, S. Savarese, and M. Chandraker, Universal correspondence network, 2016.

D. D. In, M. Lee, U. V. Sugiyama, I. Luxburg, R. Guyon et al., Advances in Neural Information Processing Systems, vol.29, pp.2414-2422

D. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), 2015.

T. Cohen and M. Welling, Group equivariant convolutional networks, ICML, 2016.

R. Collobert, K. Kavukcuoglu, and C. Farabet, Torch7: A matlab-like environment for machine learning, BigLearn, NIPS Workshop, 2011.

B. Conejo, N. Komodakis, S. Leprince, and J. Avouac, Inference by learning: Speeding-up graphical model optimization via a coarse-to-fine cascade of pruning classifier, NIPS, 2014.

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems (MCSS), vol.2, pp.303-314, 1989.

J. Dai, K. He, and J. Sun, Instance-aware semantic segmentation via multi-task network cascades, CVPR, 2016.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, 2009.

M. Denil, L. Bazzani, H. Larochelle, and N. De-freitas, Learning where to attend with deep architectures for image tracking, Neural Computation, 2012.

P. Dollár, R. Appel, S. Belongie, and P. Perona, Fast feature pyramids for object detection, 2014.

S. E. Dreyfus, The computational solution of optimal control problems with time lag, IEEE Transactions on Automatic Control, vol.18, issue.4, pp.383-385, 1973.

H. Drucker and Y. Lecun, Improving generalization performance using double backpropagation, IEEE Transaction on Neural Networks, vol.3, issue.6, pp.991-997, 1992.

J. C. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, COLT, 2010.

V. Dumoulin and F. Visin, A guide to convolution arithmetic for deep learning, 2016.

D. Eigen, C. Puhrsch, F. , and R. , Depth map prediction from a single image using a multi-scale deep network, NIPS, 2014.

M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, The PASCAL visual object classes (VOC) challenge, 2010.

O. Faugeras, T. Viéville, E. Theron, J. Vuillemin, B. Hotz et al., Real-time correlation-based stereo : algorithm, implementations and applications, 1993.
URL : https://hal.archives-ouvertes.fr/inria-00074658

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained part-based models, 2010.

P. Fischer, A. Dosovitskiy, and T. Brox, Descriptor matching with convolutional neural networks: a comparison to, 2014.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, pp.193-202, 1980.

G. , V. K. Carneiro, G. , R. , and I. D. , Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5385-5394, 2016.

C. F. Gauss, Theoria motus corporum coelestium in sectionibus conicis solem ambientium, 1809.

S. Gidaris and N. Komodakis, Object detection via a multi-region and semantic segmentationaware cnn model, ICCV, 2015.

S. Gidaris and N. Komodakis, Locnet: Improving localization accuracy for object detection, Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832507

R. Girshick, Fast R-CNN, ICCV, 2015.

R. Girshick, J. Donahue, T. Darrell, M. , and J. , Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.

M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz, Multi-view stereo for community photo collections, Proceedings of the 11th International Conference on Computer Vision (ICCV 2007), pp.265-270, 2007.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio, Maxout networks, Proceedings of the 30th International Conference on Machine Learning (ICML'13), pp.1319-1327, 2013.

B. Graham, Fractional max-pooling, 2014.

X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg, Matchnet: Unifying feature and metric learning for patch-based matching, CVPR, 2015.

B. Hariharan, P. Arbeláez, R. Girshick, M. , and J. , Hypercolumns for object segmentation and fine-grained localization, CVPR, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, IEEE International Conference on Computer Vision (ICCV), pp.1026-1034, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, ECCV, 2016.

G. E. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural networks, 2015.

S. Hochreiter and J. Schmidhuber, Long short-term memory, 1997.

E. Hoffer and N. Ailon, Deep metric learning using triplet network, SIMBAD, 2015.

D. Hoiem, Y. Chodpathumwan, and Q. Dai, Diagnosing error in object detectors, ICCV, 2012.

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings National Academy of Science, vol.79, pp.2554-2558, 1982.

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989.

J. Hosang, R. Benenson, P. Dollár, and B. Schiele, What makes for effective detection proposals?, 2015.

G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, Deep networks with stochastic depth, ECCV, 2016.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.448-456, 2015.

A. G. Ivakhnenko, The group method of data handling -a rival of the method of stochastic approximation, Soviet Automatic Control, vol.13, issue.3, pp.43-55, 1968.

A. G. Ivakhnenko, Polynomial theory of complex systems, IEEE Transactions on Systems, Man and Cybernetics, issue.4, pp.364-378, 1971.

A. G. Ivakhnenko and V. G. Lapa, Cybernetic Predicting Devices. CCM Information Corporation, 1965.

A. G. Ivakhnenko, V. G. Lapa, and R. N. Mcdonough, Cybernetics and forecasting techniques, 1967.

H. J. Kelley, Gradient theory of optimal flight paths, ARS Journal, vol.30, issue.10, pp.947-954, 1960.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

T. Kohonen, Correlation matrix memories. Computers, IEEE Transactions on, vol.100, issue.4, pp.353-359, 1972.

T. Kohonen, Self-Organization and Associative Memory, 1988.

N. Komodakis, G. Tziritas, P. , and N. , Fast, approximately optimal solutions for single and dynamic MRFs, CVPR, 2007.

M. Kozinski, R. Gadde, S. Zagoruyko, G. Obozinski, and R. Marlet, A mrf shape prior for facade parsing with occlusions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2820-2828, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01232598

J. Martens and R. B. Grosse, Optimizing neural networks with kronecker-factored approximate curvature, ICML, 2015.

W. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, vol.7, pp.115-133, 1943.

K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.27, issue.10, pp.1615-1630, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548227

V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, Recurrent models of visual attention, NIPS, 2014.

G. F. Montúfar, R. Pascanu, K. Cho, and Y. Bengio, On the number of linear regions of deep neural networks, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.2924-2932, 2014.

E. Nowak and F. Jurie, Learning Visual Similarity Measures for Comparing Never Seen Objects, CPVR 2007 -IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00203958

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free? -weaklysupervised learning with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01015140

E. Oyallon, E. Belilovsky, and S. Zagoruyko, Scaling the scattering transform: Deep hybrid networks, The IEEE International Conference on Computer Vision (ICCV), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01495734

M. Papadomanolaki, M. Vakalopoulou, S. Zagoruyko, and K. Karantzalos, Benchmarking deep learning frameworks for the classification of very high resolution satellite multispectral data, 2016.

, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, III-7, pp.83-88

D. B. Parker, Learning-logic, 1985.

O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep face recognition, British Machine Vision Conference, 2015.

P. O. Pinheiro, R. Collobert, and P. Dollár, Learning to segment object candidates, NIPS, 2015.

P. O. Pinheiro, T. Lin, R. Collobert, and P. Dollár, Learning to refine object segments, ECCV, 2016.

B. Polyak, Some methods of speeding up the convergence of iteration methods, Ussr Computational Mathematics and Mathematical Physics, vol.4, pp.1-17, 1964.

L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamrelidze, and E. F. Mishchenko, The Mathematical Theory of Optimal Processes, 1961.

A. Quattoni and A. Torralba, Recognizing indoor scenes, CVPR, 2009.

T. Raiko, H. Valpola, and Y. Lecun, Deep learning made easier by linear transformations in perceptrons, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol.22, pp.924-932, 2012.

A. S. Razavian, H. Azizpour, J. Sullivan, C. , and S. , CNN features off-the-shelf: An astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops, pp.512-519, 2014.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS, 2015.

R. A. Rensink, The dynamic representation of scenes, Visual Cognition, pp.17-42, 2000.

A. Romero, N. Ballas, S. Ebrahimi-kahou, A. Chassang, C. Gatta et al., FitNets: Hints for thin deep nets, 2014.

F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958.

E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, Orb: An efficient alternative to sift or surf, Proceedings of the 2011 International Conference on Computer Vision, ICCV '11, pp.2564-2571, 2011.

T. Salimans and D. P. Kingma, Weight normalization: A simple reparameterization to accelerate training of deep neural networks, Neural Information Processing Systems, 2016.

J. Schmidhuber, Learning complex, extended sequences using the principle of history compression, Neural Computation, vol.4, issue.2, pp.234-242, 1992.

J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks : the official journal of the International Neural Network Society, vol.61, pp.85-117, 2015.

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh et al., Grad-cam: Visual explanations from deep networks via gradient-based localization, The IEEE International Conference on Computer Vision (ICCV), 2017.

P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. Lecun, Pedestrian detection with unsupervised multi-stage feature learning, CVPR, 2013.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, ICLR, 2014.

E. Simo-serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua et al., Discriminative Learning of Deep Convolutional Feature Point Descriptors, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2015.

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, ICLR Workshop, 2014.

N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: Exploring photo collections in 3d, ACM Trans. Graph, vol.25, issue.3, pp.835-846, 2006.

N. Snavely, S. M. Seitz, and R. Szeliski, Modeling the world from internet photo collections, Int. J. Comput. Vision, vol.80, issue.2, pp.189-210, 2008.

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, Striving for simplicity: The all convolutional net, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, 2014.

R. K. Srivastava, K. Greff, and J. Schmidhuber, Training very deep networks, Advances in Neural Information Processing Systems, vol.28, pp.2377-2385, 2015.

C. Strecha, W. Von-hansen, L. J. Gool, P. Fua, and U. Thoennessen, On benchmarking camera calibration and multi-view stereo for high resolution imagery, CVPR, 2008.
DOI : 10.1109/cvpr.2008.4587706

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, On the importance of initialization and momentum in deep learning, Proceedings of the 30th International Conference on Machine Learning (ICML-13), vol.28, pp.1139-1147, 2013.

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan et al., Intriguing properties of neural networks, 2013.

C. Szegedy, S. Reed, D. Erhan, A. , and D. , Scalable, high-quality object detection, 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, CVPR, 2015.

C. Szegedy, S. Ioffe, and V. Vanhoucke, Inception-v4, inception-resnet and the impact of residual connections on learning, 2016.

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

R. Tao, E. Gavves, and A. W. Smeulders, Siamese instance search for tracking, 2016.

E. Tola, V. Lepetit, and P. Fua, A Fast Local Descriptor for Dense Matching, Proceedings of Computer Vision and Pattern Recognition, 2008.

A. Torralba, Contextual priming for object detection, 2003.

T. Trzcinski, C. M. Christoudias, V. Lepetit, and P. Fua, Learning image descriptors with the boosting-trick, NIPS, 2012.

T. Trzcinski, C. M. Christoudias, P. Fua, and V. Lepetit, Boosting binary keypoint descriptors, IEEE Conference on Computer Vision and Pattern Recognition, pp.2874-2881, 2013.

J. Uijlings, K. Van-de-sande, T. Gevers, and A. Smeulders, Selective search for object recog, 2013.

N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino et al., Fast convolutional nets with fbfft: A GPU performance evaluation, 2014.

P. Viola and M. J. Jones, Robust real-time face detection, 2004.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, 2011.

P. J. Werbos, Applications of advances in nonlinear sensitivity analysis, Proceedings of the 10th IFIP Conference, 31.8 -4.9, pp.762-770, 1981.

A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, The marginal value of adaptive gradient methods in machine learning, NIPS, 2017.

K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville et al., Show, attend and tell: Neural image caption generation with visual attention, ICML, 2015.

Z. Yang, X. He, J. Gao, L. Deng, and A. J. Smola, Stacked attention networks for image question answering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.21-29, 2016.

K. M. Yi, Y. Verdie, P. Fua, and V. Lepetit, Learning to assign orientations to feature points, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.107-116, 2016.

K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, Lift: Learned invariant feature transform, ECCV, 2016.

S. Zagoruyko and N. Komodakis, Learning to compare image patches via convolutional neural networks, Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01246261

S. Zagoruyko and N. Komodakis, Deep compare: A study on using convolutional neural networks to compare image patches, Computer Vision and Image Understanding Special Issue: Deep Learning, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01830004

S. Zagoruyko and N. Komodakis, Wide residual networks, BMVC, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

S. Zagoruyko and N. Komodakis, Diracnets: Training very deep neural networks without skip-connections, 2017.

S. Zagoruyko and N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, ICLR, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01832769

S. Zagoruyko, A. Lerer, T. Lin, P. O. Pinheiro, S. Gross et al., A multipath network for object detection, BMVC, 2016.

J. Zbontar and Y. Lecun, Computing the stereo matching cost with a convolutional neural network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1592-1599, 2015.

M. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, ECCV, 2014.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Learning deep features for discriminative localization, Computer Vision and Pattern Recognition, 2016.

C. L. Zitnick and P. Dollár, Edge boxes: Locating object proposals from edges, ECCV, 2014.