E. Allgower and K. Georg, Numerical continuation methods: An introduction
DOI : 10.1137/1.9780898719154

N. Bathmanghelich, B. Taskar, and C. Davatzikos, Generative-discriminative basis learning for medical imaging. Transaction on Medical Imaging, 2011.

P. Baudin, D. Goodman, P. Kumar, N. Azzabou, P. G. Carlier et al., Discriminative Parameter Estimation for Random Walks Segmentation
DOI : 10.1007/978-3-642-40760-4_28

URL : https://hal.archives-ouvertes.fr/hal-00856020

M. Bazaraa, H. Sherali, and C. Shetty, Nonlinear Programming -Theory and Algorithms, 1993.

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553380

M. Berger, G. Badis, A. Gehrke, and S. Talukder, Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences, Cell, vol.133, issue.7, 2008.
DOI : 10.1016/j.cell.2008.05.024

C. Bishop, Pattern recognition and machine learning, 2006.

M. Blaschko, A. Vedaldi, and A. Zisserman, Simultaneous object detection and ranking with weak supervision, NIPS, 2010.

D. Blei and M. Jordan, Variational inference for Dirichlet process mixtures, Bayesian Analysis, vol.1, issue.1, 2006.
DOI : 10.1214/06-BA104

A. Blum and T. Mitchell, Combining labeled and unlabeled data with co-training, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, p.98
DOI : 10.1145/279943.279962

D. Cohn, Z. Ghahramani, and M. Jordan, Active learning with statistical models, JAIR, vol.4, pp.129-145, 1996.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of Royal Statistical Society, 1977.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A largescale hierarhical image database, CVPR, 2009.

P. Felzenszwalb, D. Mcallester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587597

T. Finley and T. Joachims, Supervised clustering with support vector machines, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102379

C. Floudas and V. Visweswaran, Primal-relaxed dual global optimization approach, Journal of Optimization Theory and Applications, vol.2, issue.2, pp.187-225, 1993.
DOI : 10.1007/BF00939667

A. Gelman, J. Carlin, H. Stern, and D. Rubin, Bayesian Data Analysis, 1995.

S. Gould, R. Fulton, and D. Koller, Decomposing a scene into geometric and semantically consistent regions, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459211

M. Guignard and S. Kim, Lagrangean decomposition: A model yielding stronger lagrangean bounds, Mathematical Programming, 1987.
DOI : 10.1007/BF02612335

J. Havrada and F. Charvat, Quantification method in classification processes: Concept of structural ?-entropy. Kybernetika, 1967.

G. Heitz, G. Elidan, B. Packer, and D. Koller, Shape-Based Object Localization for Descriptive Classification, International Journal of Computer Vision, vol.26, issue.5, 2009.
DOI : 10.1007/s11263-009-0228-y

G. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, NIPS, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

G. Hinton and R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, 2006.
DOI : 10.1126/science.1127647

T. Jaakkola, M. Meila, and T. Jebara, Maximum entropy discrimination, NIPS, 1999.

J. Jancsary, S. Nowozin, T. Sharp, and C. Rother, Regression Tree Fields — An efficient, non-parametric approach to image labeling problems, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247950

E. Jaynes, Probability theory: The logic of science, 2003.
DOI : 10.1017/CBO9780511790423

T. Jebara, Discriminative, generative and imitative learning, 2001.

T. Jebara and T. Jaakkola, Feature selection and dualities in maximum entropy discrimination, UAI, 2000.

T. Joachims, T. Finley, and C. Yu, Cutting-plane training of structural SVMs, Machine Learning, 2009.
DOI : 10.1007/s10994-009-5108-8

D. Koller and N. Friedman, Probabilistic graphical models: Principles and techniques, 2009.

N. Komodakis, Efficient training for pairwise and higher order CRFs using dual decomposition, CVPR, 2011.

N. Komodakis and N. Paragios, Beyond pairwise energies: Efficient optimization for higher-order MRFs, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206846

N. Komodakis, N. Paragios, and G. Tziritas, MRF Optimization via Dual Decomposition: Message-Passing Revisited, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408890

M. P. Kumar and D. Koller, Efficiently selecting regions for scene understanding, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540072

M. P. Kumar, H. Turki, D. Preston, and D. Koller, Learning specific-class segmentation from diverse data, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126446

G. Kundu, V. Srikumar, and D. Roth, Margin-based decomposed amortized inference, ACL, 2013.

L. Ladicky, C. Russell, P. Kohli, and P. Torr, Associative hierarchical CRFs for object class image segmentation, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459248

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

H. Lee, R. Grosse, R. Ranganath, and A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553453

F. Li, J. Carreira, and C. Sminchisescu, Object recognition as ranking holistic figure-ground hypotheses, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539839

S. Li, J. Graca, and B. Taskar, Wiki-ly supervised part-of-speech tagging, EMNLP, 2012.

S. Maji, L. Bourdev, and J. Malik, Action recognition from a distributed representation of pose and appearance, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995631

A. Mathai and P. Rathie, Basic Concepts in Information Theory and Statistics, 1974.

K. Miller, M. P. Kumar, B. Packer, D. Goodman, and D. Koller, Max-margin min-entropy models, AISTATS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00773602

T. Minka and Z. Ghahramani, Expectation propagation for infinite mixtures, NIPS Workshop on Nonparametric Bayesian Methods and Infinite Models, 2003.

R. Neal, Markov chain sampling methods for Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 2000.

R. Neal and G. Hinton, A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, Learning in Graphical Models, 1999.
DOI : 10.1007/978-94-011-5014-9_12

V. Ng and C. Cardie, Improving machine learning approaches to coreference resolution, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, 2002.
DOI : 10.3115/1073083.1073102

K. Nigam and R. Ghani, Analyzing the effectiveness and applicability of co-training, Proceedings of the ninth international conference on Information and knowledge management , CIKM '00
DOI : 10.1145/354756.354805

S. Nowozin and C. Lampert, Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision, 2010.

L. Pishchulin, A. Jain, M. Andriluka, B. Thormaehlen, and . Schiele, Articulated people detection and pose estimation: Reshaping the future, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248052

C. Rao, Diversity and dissimilarity coefficients: A unified approach, Theoretical Population Biology, vol.21, issue.1, 1982.
DOI : 10.1016/0040-5809(82)90004-1

A. Renyi, On measures of information and entropy, Berkeley Symposium on Mathematics, Statistics and Probability, 1961.

J. Salojarvi, K. Puolamaki, and S. Kaski, Expectation maximization algorithms for conditional likelihoods, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102446

S. Shalev-shwartz, Y. Singer, and N. Srebro, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, 2009.
DOI : 10.1145/1273496.1273598

P. Simard, B. Victorri, Y. Lecun, and J. Denker, Tangent Prop -a formalism for specifying selected invariances in adaptive network, NIPS, 1991.

A. Smola, S. V. Vishwanathan, and T. Hofmann, Kernel methods for missing variables, AISTATS, 2005.

B. Sriperumbudur and G. Lanckriet, On the convergence of concave-convex procedure, NIPS Workshop on Optimization for Machine Learning, 2009.

R. Sundberg, Maximum likelihood theory for incomplete data from an exponential family, Scandinavian Journal of Statistics, 1974.

B. Taskar, C. Guestrin, and D. Koller, Max-margin Markov networks, NIPS, 2003.

S. Tong and D. Koller, Support vector machine active learning with applications to text classification, JMLR, vol.2, pp.45-66, 2001.

I. Tsochantaridis, T. Hofmann, Y. Altun, and T. Joachims, Support vector machine learning for interdependent and structured output spaces, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015341

C. Yu and T. Joachims, Learning structural SVMs with latent variables, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553523

A. Yuille and A. Rangarajan, The Concave-Convex Procedure, Neural Computation, vol.39, issue.4, 2003.
DOI : 10.1162/08997660260028674

W. Zaremba, M. P. Kumar, A. Gramfort, and M. Blaschko, Learning from M/EEG Data with Variable Brain Activation Delays, IPMI, 2013.
DOI : 10.1007/978-3-642-38868-2_35

URL : https://hal.archives-ouvertes.fr/hal-00803981

K. Zhang, I. Tsang, and J. Kwok, Maximum margin clustering made practical, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273637