M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Tensorflow: Large-scale machine learning on heterogeneous systems, 2015.

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua et al., Slic superpixels, 2010.

M. Andrychowicz, M. Denil, S. Gomez, W. Matthew, D. Hoffman et al., Learning to learn by gradient descent by gradient descent, Advances in Neural Information Processing Systems (NIPS), 2016.

R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl et al., Large scale distributed neural network training through online distillation, International Conference on Learning Representations (ICLR), 2018.

A. Antoniou, A. Storkey, and H. Edwards, Data augmentation generative adversarial networks, 2017.

A. Arnab, H. S. Philip, and . Torr, Pixelwise instance segmentation with a dynamically instantiated network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on Pattern Analysis and Machine Intelligence, 2017.

M. Bai and R. Urtasun, Deep watershed transform for instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

M. Bar and S. Ullman, Spatial context in recognition, Perception, vol.25, pp.343-352, 1996.

E. Barnea and O. Ben-shahar, On the utility of context (or the lack thereof) for object detection, 2017.

E. Bart and S. Ullman, Cross-generalization: Learning novel classes from a single example by feature replacement, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol.1, pp.672-679, 2005.

R. Barth, J. Hemming, and E. Van-henten, Improved part segmentation performance by optimising realism of synthetic images using cycle generative adversarial networks, 2018.

S. Bell, L. Zitnick, K. Bala, and R. Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

S. Ben-david, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira et al., A theory of learning from different domains, Machine learning, vol.79, issue.1-2, pp.151-175, 2010.

R. Benenson, S. Popov, and V. Ferrari, Large-scale interactive object segmentation with human annotators, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi, Metalearning with differentiable closed-form solvers, International Conference on Learning Representations (ICLR), 2019.

L. Bertinetto, F. João, J. Henriques, P. Valmadre, A. Torr et al., Learning feed-forward one-shot learners, Advances in Neural Information Processing Systems (NIPS), pp.523-531, 2016.

A. Bietti, G. Mialon, D. Chen, and J. Mairal, A kernel perspective for regularizing deep neural networks, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01884632

L. Breiman, Heuristics of instability and stabilization in model selection, The Annals of Statistics, vol.24, issue.6, pp.2350-2383, 1996.

M. Buhrmester, T. Kwang, and S. D. Gosling, Amazon's mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science, vol.6, pp.3-5, 2011.

M. Calonder, V. Lepetit, C. Strecha, and P. Fua, Brief: Binary robust independent elementary features, Proceedings of the European Conference on Computer Vision (ECCV), pp.778-792, 2010.

M. Caron, P. Bojanowski, A. Joulin, and M. Douze, Deep clustering for unsupervised learning of visual features, Proceedings of the European Conference on Computer Vision (ECCV), 2018.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, International Conference on Learning Representations (ICLR), 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.40, pp.834-848, 2018.

L. Chen, G. Papandreou, F. Schroff, and H. Adam, Rethinking atrous convolution for semantic image segmentation, 2017.

W. Chen, Y. Liu, Z. Kira, Y. Wang, and J. Huang, A closer look at few-shot classification, International Conference on Learning Representations (ICLR), 2019.

X. Chen, R. Girshick, K. He, and P. Dollár, Tensormask: A foundation for dense object segmentation, 2019.

M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky, Exploiting hierarchical context on a large database of object categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

W. Chu and D. Cai, Deep feature based contextual model for object detection, Neurocomputing, vol.275, pp.1035-1042, 2018.

C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol.20, issue.3, pp.273-297, 1995.

G. Csurka and C. Dance, Lixin Fan, Jutta Willamowski, and Cédric Bray. Visual categorization with bags of keypoints

B. Ekin-d-cubuk, D. Zoph, V. Mane, Q. Vasudevan, . Le et al., Learning augmentation policies from data, 2018.

J. Dai, K. He, and J. Sun, Instance-aware semantic segmentation via multi-task network cascades, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

J. Devlin, M. Chang, K. Lee, and K. T. Bert, Pre-training of deep bidirectional transformers for language understanding, 2018.

G. Thomas and . Dietterich, Ensemble methods in machine learning, International workshop on multiple classifier systems, 2000.

D. Santosh-k-divvala, . Hoiem, H. James, A. A. Hays, M. Efros et al., An empirical study of context in object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

C. Doersch and A. Zisserman, Multi-task self-supervised visual learning, Proceedings of the International Conference on Computer Vision (ICCV, 2017.

A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

N. Dvornik, J. Mairal, and C. Schmid, Modeling visual context is key to augmenting object detection datasets, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01844474

N. Dvornik, J. Mairal, and C. Schmid, On the importance of visual context for data augmentation in scene understanding, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01869784

N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, Blitznet: A real-time deep network for scene understanding, Proceedings of the International Conference on Computer Vision (ICCV), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01573361

D. Dwibedi, I. Misra, and M. Hebert, Cut, paste and learn: Surprisingly easy synthesis for instance detection, Proceedings of the International Conference on Computer Vision (ICCV), 2017.

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, Scalable object detection using deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

M. Everingham, L. Van-gool, K. I. Christopher, J. Williams, A. Winn et al., The PASCAL visual object classes (VOC) challenge, International Journal of Computer Vision, vol.88, issue.2, pp.303-338, 2010.

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning hierarchical features for scene labeling, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.35, pp.1915-1929, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00742077

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.28, pp.594-611, 2006.

F. Pedro, R. B. Felzenszwalb, D. Girshick, D. Mcallester, and . Ramanan, Object detection with discriminatively trained part-based models, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.32, pp.1627-1645, 2010.

S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun, Bottom-up segmentation for top-down detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

C. Finn, P. Abbeel, and S. Levine, Model-agnostic metalearning for fast adaptation of deep networks, 2017.

M. Frid-adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger et al., Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification, Neurocomputing, vol.321, pp.321-331, 2018.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol.1, 2001.

C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. Berg, DSSD: Deconvolutional single shot detector, 2017.

Y. Ganin and V. Lempitsky, Unsupervised domain adaptation by backpropagation, 2015.

M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton et al., Conditional neural processes, 2018.

G. Georgakis, A. Mousavian, C. Alexander, J. Berg, and . Kosecka, Synthesizing training data for object detection in indoor scenes, Robotics: Science and Systems, 2017.

S. Gidaris and N. Komodakis, Dynamic few-shot visual learning without forgetting, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01829985

S. Gidaris, P. Singh, and N. Komodakis, Unsupervised representation learning by predicting image rotations, International Conference on Learning Representations (ICLR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01864755

R. Girshick and R. Fast, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

R. Girshick, I. Radosavovic, and G. Gkioxari, Piotr Dollár, and Kaiming He, 2018.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems (NIPS), 2014.

S. Gould, R. Fulton, and D. Koller, Decomposing a scene into geometric and semantically consistent regions, Proceedings of the International Conference on Computer Vision (ICCV), 2009.

A. Gupta, A. Vedaldi, and A. Zisserman, Synthetic data for text localisation in natural images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

R. Richard-hr-hahnloser, M. A. Sarpeshkar, . Mahowald, J. Rodney, H. Douglas et al., Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit, Nature, vol.405, issue.6789, p.947, 2000.

A. Handa, V. Viorica-patraucean, S. Badrinarayanan, R. Stent, and . Cipolla, Understanding real world indoor scenes with synthetic data

, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

S. Hanson, Y. Lorien, and . Pratt, Comparing biases for minimal network construction with back-propagation, Advances in Neural Information Processing Systems (NIPS), 1989.

P. Bharath-hariharan, L. Arbeláez, S. Bourdev, J. Maji, and . Malik, Semantic contours from inverse detectors, Proceedings of the International Conference on Computer Vision (ICCV), 2011.

P. Bharath-hariharan, R. Arbeláez, J. Girshick, and . Malik, Simultaneous detection and segmentation, Proceedings of the European Conference on Computer Vision (ECCV), 2014.

P. Bharath-hariharan, R. Arbeláez, J. Girshick, and . Malik, Hypercolumns for object segmentation and fine-grained localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

B. Hariharan and R. Girshick, Low-shot visual recognition by shrinking and hallucinating features, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

K. He, R. Girshick, and P. Dollár, , 2018.

K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, Mask r-cnn, Proceedings of the International Conference on Computer Vision (ICCV, 2017.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

X. He, S. Richard, D. Zemel, and . Ray, Learning and incorporating top-down cues in image segmentation, Proceedings of the European Conference on Computer Vision (ECCV), 2006.

A. Hernández, -. García, and P. König, Data augmentation instead of explicit regularization, 2018.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, NIPS Deep Learning and Representation Learning Workshop

H. Inoue, Data augmentation by pairing samples for images classification, 2018.

H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez et al., Aggregating local image descriptors into compact codes, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.34, pp.1704-1716, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00633013

J. Johnson, A. Gupta, and L. Fei-fei, Image generation from scene graphs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1219-1228, 2018.

K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem, Rendering synthetic objects into legacy photographs, ACM Transactions on Graphics (TOG), vol.30, issue.6, p.157, 2011.

A. Khoreva, R. Benenson, J. H. Hosang, M. Hein, and B. Schiele, Simple does it: Weakly supervised instance and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR), 2015.

P. Diederik, M. Kingma, and . Welling, Auto-encoding variational bayes, International Conference on Learning Representations (ICLR), 2014.

G. Koch, Siamese neural networks for one-shot image recognition, 2015.

I. Kokkinos, UberNet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques, 2009.

S. Kong, C. Charless, and . Fowlkes, Recurrent pixel embedding for instance grouping, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

P. Krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials, Advances in Neural Information Processing Systems (NIPS), 2011.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012.

S. Kumar and M. Hebert, Discriminative random fields: A discriminative framework for contextual interaction in classification, 2003.

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin et al., The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale, 2018.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.

R. Brenden-m-lake, J. B. Salakhutdinov, and . Tenenbaum, Humanlevel concept learning through probabilistic program induction, Science, vol.350, issue.6266, pp.1332-1338, 2015.

Y. Lecun, The mnist database of handwritten digits

Y. Lecun, B. Boser, S. John, D. Denker, R. E. Henderson et al., Backpropagation applied to handwritten zip code recognition, Neural computation, 1989.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

J. Lemley, S. Bazrafkan, and P. Corcoran, Smart augmentation learning an optimal data augmentation strategy, IEEE Access, vol.5, pp.5858-5869, 2017.

T. Leung and J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision (IJCV), vol.43, pp.29-44, 2001.

H. Li, B. Singh, M. Najibi, Z. Wu, and L. Davis, An analysis of pre-training on object detection, 2019.

Y. Li, K. He, and J. Sun, Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems (NIPS), 2016.

Z. Liao, A. Farhadi, Y. Wang, I. Endres, and D. Forsyth, Building a dictionary of image fragments, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

G. Lin and C. Shen, Exploring context with deep structured models for semantic segmentation, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.40, pp.1352-1366, 2018.

T. Lin, P. Dollár, and R. Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

T. Lin, P. Goyal, and R. Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection, Proceedings of the International Conference on Computer Vision (ICCV), 2017.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common objects in context, Proceedings of the European Conference on Computer Vision (ECCV), 2014.

G. Liu, A. Fitsum, K. J. Reda, T. Shih, A. Wang et al., Image inpainting for irregular holes using partial convolutions, Proceedings of the European Conference on Computer Vision (ECCV), pp.85-100, 2018.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single shot multibox detector, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

G. David and . Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision (IJCV), vol.60, issue.2, pp.91-110, 2004.

W. Luo, Y. Li, R. Urtasun, and R. Zemel, Understanding the effective receptive field in deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2016.

D. Maclaurin, D. Duvenaud, and R. Adams, Gradient-based hyperparameter optimization through reversible learning, 2015.

R. B. Dhruv-kumar-mahajan, V. Girshick, K. Ramanathan, M. He, Y. Paluri et al., Ashwin Bharambe, and Laurens van der Maaten. Exploring the limits of weakly supervised pretraining, Proceedings of the European Conference on Computer Vision (ECCV), 2018.

J. Mccormac, A. Handa, S. Leutenegger, and A. Davison, Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation, Proceedings of the International Conference on Computer Vision (ICCV, 2017.

T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, Distance-based image classification: Generalizing to new classes at nearzero cost, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.35, pp.2624-2637, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00817211

Y. Movshovitz-attias, T. Kanade, and Y. Sheikh, How useful is photo-realistic rendering for visual learning?, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

N. Murray, L. Marchesotti, and F. Perronnin, Ava: A large-scale database for aesthetic visual analysis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2408-2415, 2012.

A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

D. Nikita, Implementation of blitznet, 2017.

D. Nikita, Implementation of context-driven data augmentation pipeline, 2018.

D. Nikita, Implementation of ensemble methods for few-shot classification, 2019.

H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

B. Oreshkin, A. Pau-rodríguez-lópez, and . Lacoste, Tadam: Task dependent adaptive metric for improved few-shot learning, Advances in Neural Information Processing Systems (NeurIPS), 2018.

C. Papageorgiou, M. Oren, and T. A. Poggio, A general framework for object detection, Sixth International Conference on Computer Vision, pp.555-562, 1998.

G. Papandreou, L. Chen, K. P. Murphy, and A. L. Yuille, Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, 2017.

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context encoders: Feature learning by inpainting, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

X. Peng, B. Sun, K. Ali, and K. Saenko, Learning deep object detectors from 3d models, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the fisher kernel for large-scale image classification, Proceedings of the European Conference on Computer Vision (ECCV), pp.143-156, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

T. Pedro-o-pinheiro, R. Lin, P. Collobert, and . Dollár, Learning to refine object segments, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

C. David and . Plaut, Experiments on learning by back propagation, 1986.

P. Pérez, M. Gangnet, and A. Blake, Poisson image editing, ACM Transactions on Graphics (SIGGRAPH'03), vol.22, pp.313-318, 2003.

S. Qiao, C. Liu, W. Shen, and A. L. Yuille, Few-shot image recognition by predicting parameters from activations, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

W. Qiu and A. Yuille, Unrealcv: Connecting computer vision to unreal engine, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, International Conference on Learning Representations (ICLR, 2017.

H. Sylvestre-alvise-rebuffi, A. Bilen, and . Vedaldi, Efficient parametrization of multi-domain deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

J. Redmon and A. Farhadi, YOLO9000: better, faster, stronger, 2016.

M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky et al., Meta-learning for semi-supervised few-shot classification, International Conference on Learning Representations (ICLR), 2018.

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NIPS), 2015.

X. Ren and J. Malik, Learning a classification model for segmentation, Proceedings of the International Conference on Computer Vision (ICCV), 2003.

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, MICCAI, 2015.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet large scale visual recognition challenge, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

D. Andrei-a-rusu, J. Rao, O. Sygnowski, R. Vinyals, S. Pascanu et al., Meta-learning with latent embedding optimization, Advances in Neural Information Processing Systems (NeurIPS), 2018.

C. Sakaridis, D. Dai, and L. Van-gool, Semantic foggy scene understanding with synthetic data, International Journal of Computer Vision (IJCV), vol.126, pp.973-992, 2018.

T. Salimans, A. Karpathy, X. Chen, and D. Kingma, Pixel-cnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications, International Conference on Learning Representations, 2017.

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image classification with the fisher vector: Theory and practice, International Journal of Computer Vision (IJCV), vol.105, issue.3, pp.222-245, 2013.

S. Sankaranarayanan, Y. Balaji, A. Jain, N. Ser, R. Lim et al., Learning from synthetic data: Addressing domain shift for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

E. Robert and . Schapire, A brief introduction to boosting

J. Schmidhuber, J. Zhao, and M. Wiering, Shifting inductive bias with success-story algorithm, adaptive levin search, and incremental self-improvement, Machine Learning, vol.28, pp.105-130, 1997.

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, 2004.

K. Shmelkov, C. Schmid, and K. Alahari, How good is my gan?, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01850447

J. Shotton, J. Winn, C. Rother, and A. Criminisi, Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, Proceedings of the European Conference on Computer Vision (ECCV), 2006.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2015.

B. Singh, S. Larry, and . Davis, An analysis of scale invariance in object detection snip, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

L. Sixt, B. Wild, and T. Landgraf, Rendergan: Generating realistic labeled data, Frontiers in Robotics and AI, vol.5, p.66, 2018.

J. Snell, K. Swersky, and R. Zemel, Prototypical networks for few-shot learning, Advances in Neural Information Processing Systems (NIPS), 2017.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

H. Su, Y. Charles-r-qi, L. J. Li, and . Guibas, Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

C. Sun, A. Shrivastava, S. Singh, and A. Gupta, Revisiting unreasonable effectiveness of data in deep learning era, Proceedings of the International Conference on Computer Vision (ICCV), pp.843-852, 2017.

M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, Multinet: Real-time joint semantic reasoning for autonomous driving, IEEE Intelligent Vehicles Symposium (IV), pp.1013-1020, 2018.

S. Thrun, Lifelong learning algorithms, Learning to learn, pp.181-209, 1998.

P. Tokmakov, Y. Wang, and M. Hebert, Learning compositional representations for few-shot recognition, Proceedings of the International Conference on Computer Vision (ICCV), 2019.

A. Torralba, Contextual priming for object detection, International Journal of Computer Vision, vol.53, issue.2, pp.169-191, 2003.

A. Torralba and P. Sinha, Statistical context priming for object detection, Proceedings of the International Conference on Computer Vision (ICCV), 2001.

T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid, A bayesian data augmentation approach for learning deep models, Advances in Neural Information Processing Systems (NIPS), 2017.

. Jasper-rr-uijlings, E. A. Koen, T. Van-de-sande, A. W. Gevers, and . Smeulders, Selective search for object recognition, International Journal of Computer Vision (IJCV), vol.104, issue.2, pp.154-171, 2013.

J. Verbeek and B. Triggs, Scene segmentation with conditional random fields learned from partially labeled images, Advances in Neural Information Processing Systems (NIPS), 2008.

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, 2008.

O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, Advances in Neural Information Processing Systems (NIPS), 2016.

P. Viola and M. Jones, Robust real-time object detection, 2001.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, 2011.

C. Weng, D. Yu, L. Michael, J. Seltzer, and . Droppo, Deep neural networks for single-channel multi-talker speech recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.23, issue.10, pp.1670-1679, 2015.

H. Wu, S. Zheng, J. Zhang, and K. Huang, Gp-gan: Towards realistic high-resolution image blending, ACM International Conference on Multimedia (ACMMM), 2019.

J. Yang, J. Lu, D. Batra, and D. Parikh, A faster pytorch implementation of faster r-cnn, 2017.

J. Yang, B. Price, S. Cohen, and M. Yang, Context driven scene parsing with attention to rare classes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

B. Yao and L. Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

J. Yao, S. Fidler, and R. Urtasun, Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

H. Han-jia-ye, . Hu, F. De-chuan-zhan, and . Sha, Learning embedding adaptation for few-shot learning, 2018.

E. Kwang-moo-yi, V. Trulls, P. Lepetit, and . Fua, Lift: Learned invariant feature transform, Proceedings of the European Conference on Computer Vision (ECCV), pp.467-483, 2016.

Y. Yoshida and T. Miyato, Spectral norm regularization for improving the generalizability of deep learning, 2017.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, International Conference on Learning Representations (ICLR), 2016.

R. Yu, X. Chen, I. Vlad, L. Morariu, and . Davis, The role of context selection in object detection, British Machine Vision Conference (BMVC), 2016.

S. Zagoruyko and N. Komodakis, Wide residual networks, British Machine Vision Conference (BMVC), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

H. Zhang, M. Cisse, D. Yann-n-dauphin, and . Lopez-paz, mixup: Beyond empirical risk minimization, International Conference on Learning Representations (ICLR), 2018.

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, Random erasing data augmentation, 2017.

Y. Zhou, Y. Zhu, Q. Ye, Q. Qiu, and J. Jiao, Weakly supervised instance segmentation using class peak response, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

B. Zoph, D. Ekin, G. Cubuk, T. Ghiasi, J. Lin et al., Learning data augmentation strategies for object detection, 2019.