J. Aach and G. Church, Aligning gene expression time series with time warping algorithms, Bioinformatics, vol.17, issue.6, pp.495-508, 2001.
DOI : 10.1093/bioinformatics/17.6.495

H. Akaike, Information theory and an extension of the maximum likelihood principle, Proc. ISIT, 1973.

H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, vol.19, issue.6, pp.716-723, 1974.
DOI : 10.1109/TAC.1974.1100705

E. L. Allwein, R. E. Schapire, and Y. Singer, Reducing multiclass to binary: A unifying approach for margin classifiers, The Journal of Machine Learning Research, vol.1, pp.113-141, 2001.

D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of euclidean sumof-squares clustering, Machine Learning, pp.245-248, 2009.

S. Arlot and M. Lerasle, Why V= 5 is enough in V-fold cross-validation. arXiv preprint arXiv:1210, 2012.

S. Arlot, A. Celisse, and Z. Harchaoui, Kernel change-point detection. arXiv preprint, 2012.

Y. Artan, M. A. Haider, D. L. Langer, T. H. Van-der-kwast, A. J. Evans et al., Prostate Cancer Localization With Multispectral MRI Using Cost-Sensitive Support Vector Machines and Conditional Random Fields, IEEE Transactions on Image Processing, vol.19, issue.9, pp.2444-2455, 2010.
DOI : 10.1109/TIP.2010.2048612

A. Aspremont and S. Boyd, Relaxations and randomized methods for nonconvex QC- QPs, 2003.

F. Bach, Learning with submodular functions: A convex optimization perspective. Foundations and Trends in ML, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00645271

F. Bach and M. Jordan, Learning spectral clustering, Adv. NIPS, 2003.

F. Bach and Z. Harchaoui, DIFFRAC: a discriminative and flexible framework for clustering, Adv. NIPS, 2008.

F. Bach and M. Jordan, Learning spectral clustering, with application to speech separation, The Journal of Machine Learning Research, vol.7, pp.1963-2001, 2006.

F. Bach, D. Heckerman, and E. Horvitz, Considering cost asymmetry in learning classifiers, The Journal of Machine Learning Research, vol.7, pp.1713-1741, 2006.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsityinducing penalties, Machine Learning, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

R. Baeza-yates and B. Ribeiro-neto, Modern information retrieval, 1999.

J. Bai and P. Perron, Estimating and Testing Linear Models with Multiple Structural Changes, Econometrica, vol.66, issue.1, pp.47-78, 1998.
DOI : 10.2307/2998540

C. Banderier and S. Schwer, Why Delannoy numbers?, Journal of Statistical Planning and Inference, vol.135, issue.1, pp.40-54, 2005.
DOI : 10.1016/j.jspi.2005.02.004

URL : https://hal.archives-ouvertes.fr/hal-00085552

A. Bar-hillel, T. Hertz, N. Shental, and D. Weinshall, Learning a Mahalanobis metric from equivalence constraints, Journal of Machine Learning Research, vol.6, pp.937-965, 2005.

P. L. Bartlett, M. Jordan, and J. D. Mcauliffe, Convexity, Classification, and Risk Bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.
DOI : 10.1198/016214505000000907

M. Basseville and I. Nikiforov, Detection of abrupt changes: theory and application, 1993.
URL : https://hal.archives-ouvertes.fr/hal-00008518

A. Bellet, A. Habrard, and M. Sebban, A survey on metric learning for feature vectors and structured data, 2013.

R. Bellman, DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS, Proceedings of the National Academy of Sciences, vol.42, issue.10, p.767, 1956.
DOI : 10.1073/pnas.42.10.767

J. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies et al., A tutorial on onset detection in music signals, IEEE Transactions on Speech and Audio Processing, vol.13, issue.5, pp.1035-1047, 2005.
DOI : 10.1109/TSA.2005.851998

S. Belongie, J. Malik, and J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.4, pp.509-522, 2002.
DOI : 10.1109/34.993558

P. Berkhin, A Survey of Clustering Data Mining Techniques, Grouping multidimensional data, pp.25-71, 2006.
DOI : 10.1007/3-540-28349-8_2

D. J. Berndt and J. Clifford, Using dynamic time warping to find patterns in time series, Proc. KDD, 1994.

D. Bertsekas, Nonlinear programming, Athena Scientific, 1999.

D. Bertsekas, Dynamic programming and optimal control, 1995.

D. Bertsekas, Convex optimization algorithms, Athena scientific, p.181, 2015.

J. Bi and . Kwok, Efficient multi-label classification with many labels, Proc. ICML, 2013.

W. Bi and J. T. Kwok, Multi-label classification on tree-and dag-structured hierarchies, Proc. ICML, 2011.

C. M. Bishop, Pattern recognition and machine learning, 2006.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.283

URL : https://hal.archives-ouvertes.fr/hal-00904991

P. Bojanowski, R. Lajugie, E. Grave, F. Bach, I. Laptev et al., Weakly-supervised alignment of video with text. arXiv preprint, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01154523

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, Proc. ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_41

URL : https://hal.archives-ouvertes.fr/hal-01053967

R. J. Bolton and D. J. Hand, Statistical fraud detection: A review, Statistical science, pp.235-249, 2002.

E. Borenstein and S. Ullman, Learning to Segment, Proc. ECCV, 2004.
DOI : 10.1007/978-3-540-24672-5_25

L. Bottou, Online learning and stochastic approximations. On-line learning in neural networks, p.142, 1998.

L. Bottou, Stochastic Gradient Descent Tricks, Neural Networks: Tricks of the Trade, pp.421-436, 2012.
DOI : 10.1137/1116025

O. Bousquet and L. Bottou, The tradeoffs of large scale learning, Adv. NIPS, 2008.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

Y. Boykov and V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, 2004.

P. S. Bradley, O. L. Mangasarian, and W. N. Street, Clustering via concave minimization, Adv. NIPS, 1997.

B. Brodsky and B. Darkhovsky, Nonparametric methods in change-point problems, 1993.
DOI : 10.1007/978-94-015-8163-9

S. Bubeck, Theory of convex optimization for machine learning. arXiv preprint, 2014.

T. S. Caetano, J. J. Mcauley, L. Cheng, Q. Le, and A. J. Smola, Learning graph matching, pp.1048-1058, 2009.

G. Carlier, Programmation dynamique, 1997.

A. Cauchy, Méthode générale pour la résolution des systemes d'équations simultanées . Comptes rendus des séances de l'Académie des sciences de Paris, pp.536-538, 1847.

A. Chambolle and J. Darbon, On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows, International Journal of Computer Vision, vol.40, issue.9, pp.288-307, 2009.
DOI : 10.1007/s11263-009-0238-9

J. Chen and A. Gupta, Parametric Statistical Change Point Analysis, Birkhäuser, 2011.

S. Chen, D. Donoho, and M. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1998.
DOI : 10.1137/S1064827596304010

Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.17, issue.8, pp.790-799, 1995.
DOI : 10.1109/34.400568

T. Cohn and P. Blunsom, Semantic role labelling with tree conditional random fields, Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL '05, 2005.
DOI : 10.3115/1706543.1706573

A. Cont, A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.6, pp.974-987, 2010.
DOI : 10.1109/TPAMI.2009.106

URL : https://hal.archives-ouvertes.fr/hal-00479737

A. Cont, D. Schwarz, N. Schnell, and C. Raphael, Evaluation of real-time audioto-score alignment, Proc. ISMIR, 2007.

T. H. Cormen and C. E. Leiserson, Introduction to algorithms, 2001.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.13, issue.1, pp.21-27, 1967.
DOI : 10.1109/TIT.1967.1053964

F. C. Crow, Summed-area tables for texture mapping, Proc. SIGGRAPH, pp.207-212, 1984.

M. Cuturi, J. Vert, O. Birkenes, and T. Matsui, A Kernel for Time Series Based on Global Alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.366260

R. B. Dannenberg, An on-line algorithm for real-time accompaniment, Ann Arbor, 1984.

R. B. Dannenberg, An intelligent multi-track audio editor, Proc. ICMC, 2007.

J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, Information-theoretic metric learning, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273523

T. De-la and T. Kanade, Discriminative cluster analysis, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.241-248, 2006.
DOI : 10.1145/1143844.1143875

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Adv. NIPS, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

H. Delannoy, Sur une question de probabilité traitée par d'alembert. Buletin de la Société mathématique de France, pp.262-265

K. Dembczynski, A. Jachnik, W. Kotlowski, W. Waegeman, and E. Huellermeier, Optimizing the F-measure in multi-label classification: Plug-in rule approach versus structured loss minimization, Proc. ICML, 2013.

J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy et al., Large-Scale Object Classification Using Label Relation Graphs, Proc. ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_4

F. Desobry, M. Davy, and C. Doncarli, An online kernel change detection algorithm, IEEE Transactions on Signal Processing, vol.53, issue.8, pp.2961-2974, 2005.
DOI : 10.1109/TSP.2005.851098

S. Dixon and G. Widmer, Match: A music alignment tool chest, Proc. ISMIR, 2005.

J. S. Downie, The music information retrieval evaluation exchange (2005???2007): A window into music information retrieval research, Acoustical Science and Technology, vol.29, issue.4, pp.247-255, 2005.
DOI : 10.1250/ast.29.247

S. Du-manoir, E. Schröck, M. Bentz, M. R. Speicher, S. Joos et al., Quantitative analysis of comparative genomic hybridization, Cytometry, vol.318, issue.1, pp.27-41, 1995.
DOI : 10.1002/cyto.990190105

J. Edmonds and R. Karp, Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems, Journal of the ACM, vol.19, issue.2, pp.248-264, 1972.
DOI : 10.1145/321694.321699

T. Eerola and P. Toiviainen, Suomen kansan esavelmat. Finnish folk song database, 2004.

A. Elisseeff and J. Weston, A kernel method for multi-labelled classification, Adv. NIPS, 2001.

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, The Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008.

T. Finley and T. Joachims, Supervised clustering with support vector machines, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102379

T. Finley and T. Joachims, Supervised k-means clustering, 2008.

R. A. Fisher, Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population, Biometrika, vol.10, issue.4, pp.507-521, 1915.
DOI : 10.2307/2331838

E. Fix and J. L. Hodges-jr, Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties, International Statistical Review / Revue Internationale de Statistique, vol.57, issue.3, 1951.
DOI : 10.2307/1403797

L. R. Ford and D. R. Fulkerson, Maximal flow through a network, Journal canadien de math??matiques, vol.8, issue.0, pp.399-404, 1956.
DOI : 10.4153/CJM-1956-045-5

D. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2002.
URL : https://hal.archives-ouvertes.fr/hal-01063327

G. Forsythe and G. Golub, On the Stationary Values of a Second-Degree Polynomial on the Unit Sphere, Journal of the Society for Industrial and Applied Mathematics, vol.13, issue.4, pp.1050-1068, 1965.
DOI : 10.1137/0113073

M. Frank and P. Wolfe, An algorithm for quadratic programming. Naval research logistics quarterly, pp.95-110, 1956.

S. Fujishige, Submodular functions and optimization, 2005.

W. Gander, G. H. Golub, and U. Matt, A constrained eigenvalue problem, Linear Algebra and its applications, vol.114, pp.815-839, 1989.

C. Gauss, Theoria motus corporum coelestium in sectionibus conicis solem ambientium. sumtibus Frid, 1809.

S. Geisser, The Predictive Sample Reuse Method with Applications, Journal of the American Statistical Association, vol.36, issue.2, pp.320-328, 1975.
DOI : 10.1080/01621459.1975.10479865

A. Ghias, J. Logan, D. Chamberlin, and B. C. Smith, Query by humming, Proceedings of the third ACM international conference on Multimedia , MULTIMEDIA '95, 1995.
DOI : 10.1145/217279.215273

O. Gillet, S. Essid, and G. Richard, On the Correlation of Automatic Audio and Visual Segmentations of Music Videos, IEEE Transactions on Circuits and Systems for Video Technology, vol.17, issue.3, pp.347-355, 2007.
DOI : 10.1109/TCSVT.2007.890831

M. Goemans and D. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, vol.42, issue.6, pp.1115-1145, 1995.
DOI : 10.1145/227683.227684

B. Gold and N. Morgan, Speech and audio signal processing, 2000.
DOI : 10.1002/9781118142882

B. Gold, N. Morgan, and D. Ellis, Speech and audio signal processing: processing and perception of speech and music, 2011.
DOI : 10.1002/9781118142882

J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, Neighbourhood component analysis, Adv. NIPS, 2004.

J. C. Gower and G. J. Ross, Minimum Spanning Trees and Single Linkage Cluster Analysis, Applied Statistics, vol.18, issue.1, pp.54-64, 1969.
DOI : 10.2307/2346439

E. Grave, Weakly supervised named entity classification, Proc. Workshop AKBC, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01095596

D. Greig, B. Porteous, and A. H. Seheult, Exact maximum a posteriori estimation for binary images, Journal of the Royal Statistical Society. Series B (Methodological), pp.271-279, 1989.

Y. Guo and D. Schuurmans, Convex relaxations of latent variable training, Adv. NIPS, 2007.

S. B. Guthery, Partition Regression, Journal of the American Statistical Association, vol.37, issue.348, pp.945-947, 1974.
DOI : 10.2307/2284373

R. Hamming, Error Detecting and Error Correcting Codes, Bell System Technical Journal, vol.29, issue.2, pp.147-160, 1950.
DOI : 10.1002/j.1538-7305.1950.tb00463.x

Z. Harchaoui, E. Moulines, and F. Bach, Kernel change-point analysis, Adv. NIPS, 2009.

B. Hariharan, L. Zelnik-manor, S. V. Vishwanathan, and M. Varma, Large scale max-margin multi-label classification with priors, Proc. ICML, 2010.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2009.

T. Hocking, G. Schleiermacher, I. Janoueix-lerosey, V. Boeva, J. Cappo et al., Learning smoothing models of copy number profiles using breakpoint annotations, BMC Bioinformatics, vol.14, issue.1, pp.14-2013
DOI : 10.1186/gb-2004-5-10-r80

URL : https://hal.archives-ouvertes.fr/hal-00663790

A. Hoerl and R. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, vol.24, issue.1, pp.55-67, 1970.
DOI : 10.2307/1909769

A. E. Hoerl, Application of ridge analysis to regression problems, Chemical Engineering Progress, vol.58, pp.55-67, 1962.

D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-label prediction via compressed sensing, NIPS, pp.772-780, 2009.

N. Hu, R. B. Dannenberg, and G. Tzanetakis, Polyphonic audio matching and alignment for music retrieval, Computer Science Department, p.521, 2003.

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, pp.193-218, 1985.
DOI : 10.1007/BF01908075

F. Itakura, Minimum prediction residual principle applied to speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.23, issue.1, pp.67-72, 1975.
DOI : 10.1109/TASSP.1975.1162641

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, Proc. ICML, 2013.

A. Jain and R. Dubes, Algorithms for clustering data

P. Jain, B. Kulis, J. Davis, and I. Dhillon, Metric and kernel learning using a linear transformation, Journal of Machine Learning Research, vol.13, pp.519-547, 2012.

R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, E. Eger et al., Multiscale Mining of fMRI Data with Hierarchical Structured Sparsity, SIAM Journal on Imaging Sciences, vol.5, issue.3, pp.835-856, 2012.
DOI : 10.1137/110832380

URL : https://hal.archives-ouvertes.fr/inria-00589785

T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, Proc. ECML, 1998.
DOI : 10.1007/BFb0026683

T. Joachims, A support vector method for multivariate performance measures, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102399

T. Joachims, T. Finley, and C. J. Yu, Cutting-plane training of structural SVMs, Machine Learning, pp.27-59, 2009.
DOI : 10.1007/s10994-009-5108-8

C. Joder, S. Essid, and G. Richard, Learning Optimal Features for Polyphonic Audio-to-Score Alignment, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2118-2128, 2013.
DOI : 10.1109/TASL.2013.2266794

D. Johnson, The NP-completeness column: An ongoing guide, Journal of Algorithms, vol.3, issue.2, pp.182-195, 1982.
DOI : 10.1016/0196-6774(82)90018-9

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Adv. NIPS, 2013.

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image cosegmentation, Proc. CVPR. IEEE, 2010.
DOI : 10.1109/cvpr.2010.5539868

A. Joulin, K. Tang, and L. Fei-fei, Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, Proc. ECCV, pp.253-268
DOI : 10.1007/978-3-319-10599-4_17

M. Journée, F. Bach, P. Absil, and R. Sepulchre, Low-Rank Optimization on the Cone of Positive Semidefinite Matrices, SIAM Journal on Optimization, vol.20, issue.5, pp.2327-2351, 2010.
DOI : 10.1137/080731359

A. Kallioniemi, O. Kallioniemi, D. Sudar, D. Rutovitz, J. W. Gray et al., Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors, Science, vol.258, issue.5083, pp.258818-821, 1992.
DOI : 10.1126/science.1359641

I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, Proc. ECML, 2008.

M. Kennedy, The Oxford dictionary of music, 1994.

J. Keshet, S. Shalev-shwartz, Y. Singer, and D. Chazan, A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.8, pp.152373-2382, 2007.
DOI : 10.1109/TASL.2007.903928

S. Khot, G. Kindler, E. Mossel, and R. O. Donnell, Optimal Inapproximability Results for MAX???CUT and Other 2???Variable CSPs?, SIAM Journal on Computing, vol.37, issue.1, pp.319-357, 2007.
DOI : 10.1137/S0097539705447372

R. Killick, P. Fearnhead, and I. A. Eckley, Optimal Detection of Changepoints With a Linear Computational Cost, Journal of the American Statistical Association, vol.63, issue.500, pp.1590-1598, 2012.
DOI : 10.1080/01621459.2012.737745

S. Kim, S. Nowozin, P. Kohli, and C. Yoo, Task-Specific Image Partitioning, IEEE Transactions on Image Processing, vol.22, issue.2, pp.488-500, 2013.
DOI : 10.1109/TIP.2012.2218822

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.370.9801

H. Kirchhoff and A. Lerch, Evaluation of Features for Audio-to-Audio Alignment, Journal of New Music Research, vol.30, issue.1, pp.27-41, 2011.
DOI : 10.1016/j.jneumeth.2006.11.004

A. Kolesnikov, M. Guillaumin, V. Ferrari, and C. H. Lampert, Closed-Form Approximate CRF Training for Scalable Image Segmentation, Proc. ECCV, 2014.
DOI : 10.1007/978-3-319-10578-9_36

D. Kuettel, M. Guillaumin, and V. Ferrari, Segmentation Propagation in ImageNet, Proc. ECCV, 2012.
DOI : 10.1007/978-3-642-33786-4_34

B. Kulis, Metric Learning: A Survey, Machine Learning, pp.287-364, 2012.
DOI : 10.1561/2200000019

S. Kumar and M. Hebert, Discriminative Random Fields, International Journal of Computer Vision, vol.21, issue.1, pp.179-201, 2006.
DOI : 10.1007/s11263-006-7007-9

URL : http://repository.cmu.edu/cgi/viewcontent.cgi?article=1360&context=robotics

M. Lacoste-julien, M. Jaggi, P. Schmidt, and . Pletscher, Block-coordinate frankwolfe optimization for structural SVMs, Proc. ICML, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720158

J. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.

J. Lagrange, Manière plus simple et plus générale de faire usage de la formule de l'équilibre, pp.77-112, 1811.

R. Lajugie, F. Bach, and S. Arlot, Large-margin metric learning for constrained partitioning problems, Proc. ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00796921

R. Lajugie, D. Garreau, F. Bach, and S. Arlot, Metric learning for temporal sequence alignment, Adv. NIPS, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01062130

C. Lampert, Maximum margin multi-label structured prediction, Adv. NIPS, 2011.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

M. Lavielle, Using penalized contrasts for the change-point problem, Signal Processing, vol.85, issue.8, pp.1501-1510, 2005.
DOI : 10.1016/j.sigpro.2005.01.012

URL : https://hal.archives-ouvertes.fr/inria-00070662

E. Lebarbier, Quelques approches pour la détection de ruptures à horizon fini, 2002.

E. Lebarbier, Detecting multiple change-points in the mean of Gaussian process by model selection, Signal Processing, vol.85, issue.4, pp.717-736, 2005.
DOI : 10.1016/j.sigpro.2004.11.012

URL : https://hal.archives-ouvertes.fr/inria-00071847

A. Legendre, Appendice sur la méthodes des moindres carrés. Nouvelles méthodes pour la détermination de l'orbite des comètes, pp.72-80

C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, Mismatch string kernels for discriminative protein classification, Bioinformatics, vol.20, issue.4, pp.467-476, 2004.
DOI : 10.1093/bioinformatics/btg431

T. Li, M. Ogihara, and Q. Li, A comparative study on content-based music genre classification, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, 2003.
DOI : 10.1145/860435.860487

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context, Proc. ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_48

J. Liu, B. Kuipers, and S. Savarese, Recognizing human actions by attributes, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995353

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.463.8447

T. Liu, Learning to Rank for Information Retrieval, Foundations and Trends?? in Information Retrieval, vol.3, issue.3, pp.225-331, 2009.
DOI : 10.1561/1500000016

D. G. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999.
DOI : 10.1109/ICCV.1999.790410

H. Masnadi-shirazi and N. Vasconcelos, Risk minimization, probability elicitation, and cost-sensitive SVMs, Proc. ICML, 2010.

P. Massart, Concentration inequalities and model selection, 2007.

D. Mcallester, Generalization bounds and consistency for structured labeling. Predicting structured data, 2007.

B. Mcfee and G. Lanckriet, Metric learning to rank, Proc. ICML, 2010.

B. Mcfee, L. Barrington, and G. Lanckriet, Learning Content Similarity for Music Recommendation, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.8, pp.2207-2218, 2012.
DOI : 10.1109/TASL.2012.2199109

K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors
URL : https://hal.archives-ouvertes.fr/inria-00548227

M. Müller, Information retrieval for music and motion, 2007.
DOI : 10.1007/978-3-540-74048-3

C. Myers and L. Rabiner, A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected-Word Recognition, Bell System Technical Journal, vol.60, issue.7, pp.1389-1409, 1981.
DOI : 10.1002/j.1538-7305.1981.tb00272.x

A. Nemirovskii, Problem complexity and method efficiency in optimization

Y. Nesterov, Introductory lectures on convex optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, vol.8, issue.1, pp.221-259, 2009.
DOI : 10.1007/s10107-007-0149-x

Y. Nesterov, A. Nemirovskii, and Y. Ye, Interior-point polynomial algorithms in convex programming, SIAM, 1994.
DOI : 10.1137/1.9781611970791

A. Ng and M. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, Adv. NIPS, 2002.

A. Ng, Y. Weiss, and M. Jordan, On spectral clustering: Analysis and an algorithm, Adv. NIPS, 2002.

M. Nilsback and A. Zisserman, A Visual Vocabulary for Flower Classification, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.42

H. Noma, Dynamic time-alignment kernel in support vector machine, Adv. NIPS, 2002.

S. Nowozin and C. H. Lampert, Structured learning and prediction in computer vision. Foundations and Trends R in Computer Graphics and Vision, pp.3-4185, 2011.

H. Nyquist, Certain topics in telegraph transmission theory American Institute of Electrical Engineers, Transactions of the, pp.617-644, 1928.

N. Orio, S. Lemouton, and D. Schwarz, Score following: State of the art and new developments, Proc. NIME, 2003.

M. Overton and H. Wolkowicz, Semidefinite programming, Mathematical Programming, pp.105-109, 1997.
DOI : 10.1007/BF02614431

C. Papadimitriou and M. Yannakakis, Optimization, approximation, and complexity classes, Proceedings of the twentieth annual ACM symposium on Theory of computing, pp.229-234, 1988.

J. P. Pestian, C. Brew, P. Matykiewicz, D. J. Hovermale, N. Johnson et al., A shared task involving multi-label classification of clinical free text, Proceedings of the Workshop on BioNLP 2007 Biological, Translational, and Clinical Language Processing, BioNLP '07, 2007.
DOI : 10.3115/1572392.1572411

J. Petterson and T. Caetano, Submodular multi-label learning, Adv. NIPS, 2011.

D. Picard, Testing and estimating change-points in time series, Adv. Applied Probability, pp.841-867, 1985.

F. Picard, S. Robin, M. Lavielle, C. Vaisse, and J. Daudin, A statistical approach for array cgh data analysis, BMC Bioinformatics, vol.6, issue.1, p.27, 2005.
DOI : 10.1186/1471-2105-6-27

URL : https://hal.archives-ouvertes.fr/hal-00427846

W. Rand, Objective Criteria for the Evaluation of Clustering Methods, Journal of the American Statistical Association, vol.15, issue.336, pp.846-850, 1971.
DOI : 10.1080/01621459.1963.10500845

F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J. Vert, Classification of microarray data using gene networks, BMC Bioinformatics, vol.8, issue.1, p.35, 2007.
DOI : 10.1186/1471-2105-8-35

URL : https://hal.archives-ouvertes.fr/hal-00433577

C. Raphael, Automatic segmentation of acoustic musical signals using hidden Markov models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.21, issue.4, pp.360-370, 1999.
DOI : 10.1109/34.761266

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951.

R. Rockafellar, Convex Analysis, Princeton Mathematics Series, vol.28, 1970.
DOI : 10.1515/9781400873173

C. Rother, V. Kolmogorov, and A. Blake, "GrabCut", ACM Transactions on Graphics, vol.23, issue.3, pp.309-314, 2004.
DOI : 10.1145/1015706.1015720

J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-taylor, Kernel-based learning of hierarchical multilabel classification models, JMLR, 2006.

N. L. Roux, M. Schmidt, and F. R. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, Adv. NIPS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2015.
DOI : 10.1007/s11263-015-0816-y

H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.26, issue.1, pp.43-49, 1978.
DOI : 10.1109/TASSP.1978.1163055

G. Schleiermacher, I. Janoueix-lerosey, A. Ribeiro, J. Klijanienko, J. Couturier et al., Accumulation of Segmental Alterations Determines Progression in Neuroblastoma, Journal of Clinical Oncology, vol.28, issue.19, pp.283122-3130, 2010.
DOI : 10.1200/JCO.2009.26.7955

M. Schmidt, B. Babanezhad, M. Ahmed, A. Defazio, A. Clifton et al., Non-uniform stochastic average gradient method for training conditional random fields, Proc. AISTATS, 2015.

B. Schölkopf and A. J. Smola, Learning with kernels: Support vector machines, regularization , optimization, and beyond, 2002.

G. Schwarz, Estimating the dimension of a model. The annals of statistics, pp.461-464, 1978.

B. Settles, Biomedical named entity recognition using conditional random fields and rich feature sets, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA '04, pp.104-107, 2004.
DOI : 10.3115/1567594.1567618

F. Sha and F. Pereira, Shallow parsing with conditional random fields, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, 2003.
DOI : 10.3115/1073445.1073473

S. P. Shah, W. L. Lam, R. T. Ng, and K. P. Murphy, Modeling recurrent DNA copy number alterations in array CGH data, Bioinformatics, vol.23, issue.13, pp.450-458, 2007.
DOI : 10.1093/bioinformatics/btm221

S. Shalev-shwartz, Y. Singer, and N. Srebro, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273598

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss, The Journal of Machine Learning Research, vol.14, issue.1, pp.567-599, 2013.

S. Shalev-shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.3-30, 2011.
DOI : 10.1145/1273496.1273598

C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, vol.27, issue.3, pp.379-423623, 1948.
DOI : 10.1002/j.1538-7305.1948.tb01338.x

J. Shawe-taylor and N. Cristianini, Support vector machines. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, pp.93-112, 2000.

N. Shental, T. Hertz, D. Weinshall, and M. Pavel, Adjustment Learning and Relevant Component Analysis, ECCV 2002, pp.776-790, 2002.
DOI : 10.1007/3-540-47979-1_52

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.2871

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. on pattern analysis and machine intelligence, vol.22, pp.888-905, 1997.

Y. Shi, A. Bellet, and F. Sha, Sparse compositional metric learning, Proc. AAAI, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01430847

P. Simard, Y. Lecun, and J. S. Denker, Efficient pattern recognition using a new transformation distance, Adv. NIPS, 1993.

E. Spjøtvoll, A Note on a Theorem of Forsythe and Golub, SIAM Journal on Applied Mathematics, vol.23, issue.3, 1972.
DOI : 10.1137/0123032

H. Steinhaus, Sur la division des corps matériels en parties, Bull. Acad. Pol. Sci., Cl. III, vol.4, pp.801-804, 1957.

I. Steinwart and A. Christmann, Support vector machines, 2008.

S. M. Stigler, The Epic Story of Maximum Likelihood, Statistical Science, vol.22, issue.4, pp.598-620, 2007.
DOI : 10.1214/07-STS249

M. Stone, Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society. Series B (Methodological), pp.111-147, 1974.

P. Szummer, M. Kohli, and D. Hoiem, Learning CRFs Using Graph Cuts, ECCV, 2008.
DOI : 10.1007/978-3-540-88688-4_43

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.7402

B. Taskar, C. Guestrin, and D. Koller, Max-margin Markov networks, Adv. NIPS, 2003.

B. Taskar, S. Lacoste-julien, and M. Jordan, Structured prediction, dual extragradient and bregman projections, The Journal of Machine Learning Research, vol.7, pp.1627-1653, 2006.

A. Tewari and P. L. Bartlett, On the Consistency of Multiclass Classification Methods, The Journal of Machine Learning Research, vol.8, pp.1007-1025, 2007.
DOI : 10.1007/11503415_10

J. D. Thompson, F. Plewniak, and O. Poch, BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs, Bioinformatics, vol.15, issue.1, pp.87-88, 1999.
DOI : 10.1093/bioinformatics/15.1.87

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.

F. Tisseur and K. Meerbergen, The Quadratic Eigenvalue Problem, SIAM Review, vol.43, issue.2, 2001.
DOI : 10.1137/S0036144500381988

A. Torres, A. Cabada, and J. J. Nieto, An Exact Formula for the Number of Alignments Between Two DNA Sequences, DNA Sequence, vol.19, issue.6, pp.427-430, 2003.
DOI : 10.1080/10425170310001617894

I. Tsochantaridis, T. Joachims, T. , Y. Altun, and Y. Singer, Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, vol.6, pp.1453-1484, 2005.

G. Tsoumakas and I. Katakis, Multi-label classification: An overview, International Journal of Data Warehousing and Mining (IJDWM), 0193.
DOI : 10.4018/jdwm.2007070101

. Vapnik, Statistical learning theory, 1998.

B. Vercoe, Synthetic rehearsal: Training the synthetic performer, Proc. ICMC, 1985.

P. Viola and M. Jones, Fast and robust classification using asymmetric adaboost and a detector cascade, Adv. NIPS, 2001.

S. Vishwanathan, N. Schraudolph, M. Schmidt, and K. P. Murphy, Accelerated training of conditional random fields with stochastic gradient methods, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143966

A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967.
DOI : 10.1109/TIT.1967.1054010

U. and V. Luxburg, A tutorial on spectral clustering, Statistics and Computing, vol.21, issue.1, pp.395-416, 2007.
DOI : 10.1007/s11222-007-9033-z

M. Wainwright and M. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, 2008.
DOI : 10.1561/2200000001

H. Wallach, Efficient training of conditional random fields, 2002.

H. Wang, A. Klaser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

K. Weinberger and L. Saul, Fast solvers and efficient implementations for distance metric learning, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1160-1167, 2008.
DOI : 10.1145/1390156.1390302

K. Weinberger and L. Saul, Distance metric learning for large margin nearest neighbor classification, The Journal of Machine Learning Research, vol.10, pp.207-244, 2009.

K. Weinberger, J. Blitzer, and L. Saul, Distance metric learning for large margin nearest neighbor classification, Adv. NIPS, 2006.

M. Welling, Robust higher order statistics, Proc. AISTATS, 2005.

M. Wertheimer, Laws of organization in perceptual forms. A Source Book of Gestalt Psychology, 1923.

J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, SUN database: Large-scale scene recognition from abbey to zoo, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539970

E. Xing, M. Jordan, S. Russell, and A. Ng, Distance metric learning with application to clustering with side-information, Adv. NIPS, 2002.

D. Yeung and H. Chang, Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints, Pattern Recognition, vol.39, issue.5, pp.1007-1010, 2006.
DOI : 10.1016/j.patcog.2005.12.004

C. Yu and T. Joachims, Learning structural SVMs with latent variables, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553523

D. Zhan, M. Li, Y. Li, and Z. Zhou, Learning instance specific distances using metric propagation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553530

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.158.2742

M. Zhang and Z. Zhou, A Review on Multi-Label Learning Algorithms, IEEE Transactions on Knowledge and Data Engineering, vol.26, issue.8, 2013.
DOI : 10.1109/TKDE.2013.39

T. Zhang, Statistical analysis of some multi-category large margin classification methods, The Journal of Machine Learning Research, vol.5, pp.1225-1251, 2004.

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, The Annals of Statistics, vol.32, issue.1, pp.56-85, 2004.
DOI : 10.1214/aos/1079120130

P. Zhao and B. Yu, On model selection consistency of lasso, The Journal of Machine Learning Research, vol.7, pp.2541-2563, 2006.

J. Zhu and E. P. Xing, Conditional topic random fields, Proc. ICML, 2010.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998