. Bak?r, Breaking svm complexity with crosstraining, Advances in Neural Information Processing Systems, pp.81-88, 2005.

J. Barnard, M. Barnard, and . Johnson, Word sense disambiguation with pictures, Artificial Intelligence, vol.167, issue.1-2, pp.13-30, 2005.
DOI : 10.1016/j.artint.2005.04.009

L. Becker, ]. S. Cun, Y. Becker, and . Le-cun, Improving the convergence of backpropagation: Learning with second-order methods, Proceedings of the 1988 Connectionist Models Summer School, 1989.

L. Bengio, ]. Y. Cun, Y. Bengio, and . Le-cun, Scaling learning algorithms towards AI, Large Scale Kernel Machines, 2007.

. Bengio, Curriculum learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.140, 2009.
DOI : 10.1145/1553374.1553380

]. Y. Bengio, Learning Deep Architectures for AI, Machine Learning, 2009.
DOI : 10.1561/2200000006

B. P. Bennett, E. J. Bennett, and . Bredensteiner, Duality and geometry in SVM classifiers, Proceedings of the 17th International Conference on Machine Learning, 2000.

B. Bordes, L. Bordes, and . Bottou, The Huller: A Simple and Efficient Online SVM, Machine Learning: ECML 2005, pp.505-512, 2005.
DOI : 10.1007/11564096_48

URL : https://hal.archives-ouvertes.fr/hal-00752501

. Bordes, Fast kernel classifiers with online and active learning, Journal of Machine Learning Research, vol.6, pp.1579-1619, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00752361

. Bordes, Solving multiclass support vector machines with LaRank, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273508

URL : https://hal.archives-ouvertes.fr/hal-00750277

. Bordes, Sequence Labelling SVMs Trained in One Pass, ECML PKDD 2008, pp.146-161, 2008.
DOI : 10.1007/978-3-540-87479-9_28

URL : https://hal.archives-ouvertes.fr/hal-00752369

. Bordes, SGD-QN: Careful quasi-Newton stochastic gradient descent, Journal of Machine Learning Research, vol.10, pp.1737-1754, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00750911

. Bottou, ]. L. Bousquet, O. Bottou, and . Bousquet, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems, 2008.

L. Bottou, ]. L. Cun, Y. Bottou, and . Le-cun, On-line learning for very large data sets, Applied Stochastic Models in Business and Industry, vol.14, issue.2, pp.137-151, 2005.
DOI : 10.1002/asmb.538

L. Bottou, C. Bottou, and . Lin, Support vector machine solvers, Large Scale Kernel Machines, pp.301-320, 2007.

]. L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

]. L. Bottou, Stochastic gradient descent on toy problems, 2007.

V. Boyd, L. Boyd, and . Vandenberghe, Convex Optimization, 2004.

. Campbell, Query learning with large margin classifiers, Proceedings of the 17th International Conference on Machine Learning, 2000.

P. Cauwenberghs, T. Cauwenberghs, and . Poggio, Incremental and decremental support vector machine learning, Advances in Neural Processing Systems, 2001.

. Crammer, ]. K. Singer, Y. Crammer, and . Singer, On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, vol.2, pp.265-292, 2001.

]. K. Crammer and Y. Singer, Ultraconservative Online Algorithms for Multiclass Problems, Journal of Machine Learning Research, vol.3, pp.951-991, 2003.
DOI : 10.1007/3-540-44581-1_7

. Crammer, ]. K. Singer, Y. Crammer, and . Singer, Loss Bounds for Online Category Ranking, Proceedings of the 18th Annual Conference on Computational Learning Theory (COLT05), 2005.
DOI : 10.1007/11503415_4

. Crammer, Online classification on a budget, Advances in Neural Information Processing Systems, 2004.

. Crammer, Online passive-aggressive algorithms, Journal of Machine Learning Research, vol.7, pp.551-585, 2006.

]. D. Crisp and C. J. Burges, A geometric interpretation of ?-SVM classifiers, Advances in Neural Information Processing Systems, 2000.

S. Cristianini, J. Cristianini, and . Shawe-taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, 2000.
DOI : 10.1017/CBO9780511801389

I. Daumé, M. Daumé, I. , and D. Marcu, Learning as search optimization, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102373

I. Daumé, Search-based structured prediction as classification, NIPS*Workshop on Advances in Structured Learning for Text and Speech Processing, 2005.

G. Denoyer, P. Denoyer, and . Gallinari, The XML document mining challenge, Advances in XML Information Retrieval and Evaluation, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX06), 2006.

W. C. Domingo, O. Domingo, and . Watanabe, MadaBoost: a modification of AdaBoost, Proceedings of the 13th Annual Conference on Computational Learning Theory (COLT00), 2000.

]. X. Driancourt, Optimisation par descente de gradient stochastique de systèmes modulaires combinant réseaux de neurones et programmation dynamique, 1994.

R. Eisenberg, R. Eisenberg, and . Rivest, On the sample complexity of PAC learning using random and chosen examples, Proceedings of the 3rd Annual ACM Workshop on Computational Learning Theory, 1990.

. Ertekin, Learning on the border, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management , CIKM '07, 2007.
DOI : 10.1145/1321440.1321461

. Ertekin, Active learning for class imbalance problem, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, 2007.
DOI : 10.1145/1277741.1277927

]. V. Fabian, Asymptotically Efficient Stochastic Approximation; The RM Case, The Annals of Statistics, vol.1, issue.3, pp.486-495, 1973.
DOI : 10.1214/aos/1176342414

. Fan, Liblinear: A library for large linear classification, Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008.

]. V. Fedorov, Theory of Optimal Experiments, 1972.

. Feldman, L0-The first five years of an automated language acquisition project, Artificial Intelligence Review, vol.49, issue.1-2, pp.103-129, 1996.
DOI : 10.1007/BF00159218

R. Fleischman, D. Fleischman, and . Roy, Intentional context in situated natural language learning, Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL '05, 2005.
DOI : 10.3115/1706543.1706562

R. Fleischman, D. Fleischman, and . Roy, Situated models of meaning for sports video retrieval, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, 2007.
DOI : 10.3115/1614108.1614118

. Franc, ]. V. Sonnenburg, S. Franc, and . Sonnenburg, Ocas optimized cutting plane algorithm for support vector machines, Proceedings of the 25th International Machine Learning Conference (ICML08). Omnipress, 2008.

S. Freund, R. E. Freund, and . Schapire, Large margin classification using the perceptron algorithm, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, 1998.
DOI : 10.1145/279943.279985

. Frieß, The kernel Adatron algorithm: a fast and simple learning procedure for support vector machines, Proceedings of the 15th International Conference on Machine Learning, 1998.

. Furey, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, vol.16, issue.10, pp.16906-914, 2000.
DOI : 10.1093/bioinformatics/16.10.906

]. C. Gentile, A new approximate maximal margin classification algorithm, Journal of Machine Learning Research, vol.2, pp.213-242, 2001.

]. E. Gilbert, An Iterative Procedure for Computing the Minimum of a Quadratic Form on a Convex Set, SIAM Journal on Control, vol.4, issue.1, pp.61-79, 1966.
DOI : 10.1137/0304007

. Graf, Parallel support vector machines: The Cascade SVM, Advances in Neural Information Processing Systems, 2005.

. Gramacy, Adaptive caching by refetching, Advances in Neural Information Processing Systems, pp.1465-1472, 2003.

. Guyon, Automatic capacity tuning of very large VC-dimension classifiers, Advances in Neural Information Processing Systems, 1993.

]. P. Haffner, Escaping the convex hull with extrapolated vector machines, Advances in Neural Information Processing Systems, pp.753-760, 2002.

. Har-peled, Constraint classification for multiclass classification and ranking, Advances in Neural Information Processing Systems, pp.785-792, 2002.

]. S. Harnad, The symbol grounding problem, Physica D: Nonlinear Phenomena, vol.42, issue.1-3, pp.335-346, 1990.
DOI : 10.1016/0167-2789(90)90087-6

]. C. Hildreth, A quadratic programming procedure, Naval Research Logistics Quarterly, vol.49, issue.1, pp.79-85, 1957.
DOI : 10.1002/nav.3800040113

. Hinton, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

. Hsieh, A dual coordinate descent method for large-scale linear SVM, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390208

L. Hsu, C. Hsu, and . Lin, A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, vol.13, pp.415-425, 2002.

]. T. Joachims, Making large-scale SVM learning practical, Advances in Kernel Methods ? Support Vector Learning, pp.169-184, 1999.

]. T. Joachims, The Maximum-Margin Approach to Learning Text Classifiers: Methods, Theory, and Algorithms, 2000.

]. T. Joachims, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '06, 2006.
DOI : 10.1145/1150402.1150429

M. J. Kate, R. Kate, and . Mooney, Using string-kernels for learning semantic parsers, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL , ACL '06, 2006.
DOI : 10.3115/1220175.1220290

M. J. Kate, R. J. Kate, and . Mooney, Learning language semantics from ambiguous supervision, Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI07), 2007.

G. S. Keerthi, E. G. Keerthi, and . Gilbert, Convergence of a generalized SMO algorithm for SVM classifier design, Machine Learning, vol.46, issue.1/3, pp.351-360, 2002.
DOI : 10.1023/A:1012431217818

. Keerthi, A fast iterative nearest point algorithm for support vector machine classifier design, IEEE Transactions on Neural Networks, vol.11, issue.1, 1999.
DOI : 10.1109/72.822516

P. Kingsbury, M. Kingsbury, and . Palmer, From treebank to propbank, Proceedings of the 3rd International Conference on Language Resources and Evaluation, 2002.

M. Kudoh, Y. Kudoh, and . Matsumoto, Use of support vector learning for chunk identification, Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000.
DOI : 10.3115/1117601.1117635

. Lafferty, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning (ICML01), 2001.

. Laskov, Intrusion Detection in Unlabeled Data with Quarter-sphere Support Vector Machines, Proceedings of Conference on Detection of Intrusions, Malware and Vulnerability Assessment, 2004.
DOI : 10.1515/PIKO.2004.228

. Laskov, Incremental support vector learning: Analysis, implementation and applications, Journal of Machine Learning Research, vol.7, pp.1909-1936, 2006.

[. Cun, Reading checks with graph transformer networks, International Conference on Acoustics, Speech, and Signal Processing, pp.151-154, 1997.

[. Cun, Efficient backprop, Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524, 1998.

[. Cun, A tutorial on energy-based learning, Bak?r et al, pp.192-241, 2007.

. Lewis, RCV1: A new benchmark collection for text categorization research, Journal of Machine Learning Research, vol.5, pp.361-397, 2004.

L. Li, P. Li, and . Long, The relaxed online maximum margin algorithm, Machine Learning, pp.361-387, 2002.

]. Lin, On the convergence of the decomposition method for support vector machines, IEEE Transactions on Neural Networks, vol.12, issue.6, pp.1288-1298, 2001.

W. Littlestone, M. Littlestone, and . Warmuth, Relating data compression and learnability, 1986.

. Loosli, Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007.

. Ma, Identifying suspicious URLs, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553462

]. D. Mackay, Information-Based Objective Functions for Active Data Selection, Neural Computation, vol.4, issue.4, pp.589-603, 1992.
DOI : 10.1088/0266-5611/1/3/006

. Maes, Sequence labelling with reinforcement learning and ranking algorithms, Machine Learning: ECML 2007, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01336187

]. C. Manning, Foundations of Statistical Natural Language Processing, 1999.

]. G. Miller, WordNet: a lexical database for English, Communications of the ACM, vol.38, issue.11, pp.39-41, 1995.
DOI : 10.1145/219717.219748

]. R. Mooney, Learning to connect language and perception, Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI08), 2008.

]. L. Morgado and C. Pereira, Incremental Kernel Machines for Protein Remote Homology Detection, In Hybrid Artificial Intelligence Systems, Lecture Notes in Computer Science, pp.409-416, 2009.
DOI : 10.1007/978-3-642-02319-4_49

A. Murata, S. Murata, and . Amari, Statistical analysis of learning dynamics, Signal Processing, vol.74, issue.1, pp.3-28, 1999.
DOI : 10.1016/S0165-1684(98)00206-0

O. Murata, T. Murata, and . Onoda, Estimation of power consumption for household electric appliances, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02., pp.2299-2303, 2002.
DOI : 10.1109/ICONIP.2002.1201903

]. N. Nilsson, Machine Learning, 1965.
DOI : 10.1017/CBO9780511819346.034

]. J. Nocedal, Updating quasi-Newton matrices with limited storage, Mathematics of Computation, vol.35, issue.151, pp.773-782, 1980.
DOI : 10.1090/S0025-5718-1980-0572855-7

]. A. Novikoff, On convergence proofs on perceptrons, Proceedings of the Symposium on the Mathematical Theory of Automata, 1962.

]. J. Platt, Fast training of support vector machines using sequential minimal optimization, Advances in Kernel Methods ? Support Vector Learning, pp.185-208, 1999.

. Pradhan, Shallow semantic parsing using support vector machines, Proceedings of the North American Chapter of the Association for Computational Linguistics -Human Language Technologies (HLT-NAACL04), 2004.

J. R. Rabiner, B. H. Rabiner, and . Juang, An introduction to hidden Markov models, IEEE ASSP Magazine, vol.3, issue.1, 1986.
DOI : 10.1109/MASSP.1986.1165342

K. M. Rifkin, A. Rifkin, and . Klautau, In defense of one-vs-all classification, Journal of Machine Learning Research, vol.5, pp.101-141, 2004.

]. F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, pp.386-408, 1958.
DOI : 10.1037/h0042519

R. , R. Roy, and E. Reiter, Connecting language to the world, Artificial Intelligence, vol.167, issue.12, pp.1-12, 2005.

]. G. Schohn and D. Cohn, Less is more: Active learning with support vector machines, Proceedings of the 17th International Conference on Machine Learning, 2000.

. Schraudolph, A stochastic quasi-Newton method for online convex optimization, Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AIstats07). Society for Artificial Intelligence and Statistics, 2007.

]. A. Schrijver, Theory of Linear and Integer Programming, 1986.

P. Sha, F. Sha, and . Pereira, Shallow parsing with conditional random fields, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, 2003.
DOI : 10.3115/1073445.1073473

. Shalev-shwartz, . S. Singer, Y. Shalev-shwartz, and . Singer, A primal-dual perspective of online learning algorithms, Machine Learning, pp.115-142, 2007.
DOI : 10.1007/s10994-007-5014-x

. Shalev-shwartz, . S. Singer, Y. Shalev-shwartz, and . Singer, A unified algorithmic approach for efficient online label ranking, Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AIstats07). Society for Artificial Intelligence and Statistics, 2007.

. Shalev-shwartz, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273598

]. J. Siskind, Grounding language in perception, Artificial Intelligence Review, vol.12, issue.1, pp.371-391, 1994.
DOI : 10.1007/BF00849726

. Smola, Bundle methods for machine learning, Advances in Neural Information Processing Systems, pp.1377-1384, 2008.

. Soon, A Machine Learning Approach to Coreference Resolution of Noun Phrases, Advances in Neural Information Processing Systems, pp.521-544, 2001.
DOI : 10.1093/ijl/3.4.235

N. Takahashi, T. Takahashi, and . Nishi, On termination of the SMO algorithm for support vector machines, Proceedings of International Symposium on Information Science and Electrical Engineering 2003, 2003.

. Taskar, Max-margin markov networks, Advances in Neural Information Processing Systems, 2004.

. Taskar, Learning structured prediction models, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102464

]. B. Taskar, Learning structured prediction models, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2004.
DOI : 10.1145/1102351.1102464

]. R. Thibadeau, Artificial Perception of Actions, Cognitive Science, vol.2, issue.2, pp.117-149, 1986.
DOI : 10.1207/s15516709cog1002_1

K. Tong, D. Tong, and . Koller, Support vector machine active learning with applications to text classification, Proceedings of the 17th International Conference on Machine Learning, 2000.

. Toutanova, Feature-rich part-of-speech tagging with a cyclic dependency network, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, 2003.
DOI : 10.3115/1073445.1073478

. Tsang, Very large SVM training using core vector machines, Proceedings of the 10th International Conference on Artificial Intelligence and Statistics (AIstats05). Society for Artificial Intelligence and Statistics, 2005.

. Tsochantaridis, Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, vol.6, pp.1453-1484, 2005.

. Usunier, Ranking with ordered weighted pairwise classification, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553509

URL : https://hal.archives-ouvertes.fr/hal-01297974

L. Vapnik, A. Vapnik, and . Lerner, Pattern recognition using generalized portrait method. Automation and Remote Control, pp.774-780, 1963.

. Vapnik, Algorihms and Programs for Dependency Estimation, Nauka, 1984.

]. V. Vapnik, Estimation of Dependences Based on Empirical Data, 1982.

]. V. Vapnik, Statistical Learning Theory, 1998.

. Von-ahn, reCAPTCHA: Human-Based Character Recognition via Web Security Measures, Science, vol.321, issue.5895, 2008.
DOI : 10.1126/science.1160379

]. L. Von-ahn, Games with a Purpose, Computer, vol.39, issue.6, pp.96-98, 2006.
DOI : 10.1109/MC.2006.196

. Warmuth, Active Learning with Support Vector Machines in the Drug Discovery Process., ChemInform, vol.43, issue.22, pp.667-673, 2003.
DOI : 10.1002/chin.200322232

. Weston, ]. J. Watkins, C. Weston, and . Watkins, Multi-class support vector machines, 1998.

. Weston, Online (and offline) on an even tighter budget, Proceedings of the 10th International Conference on Artificial Intelligence and Statistics (AIstats05). Society for Artificial Intelligence and Statistics, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00752500

. Winograd, Understanding natural language, Cognitive Psychology, vol.3, issue.1, 1972.
DOI : 10.1016/0010-0285(72)90002-3

]. P. Winston, The psychology of computer vision, Pattern Recognition, vol.8, issue.3, pp.193-193, 1976.
DOI : 10.1016/0031-3203(76)90020-0

R. Wong and . Mooney, Learning synchronous grammars for semantic parsing with lambda calculus, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL07), 2007.

Y. , B. Yu, and D. H. Ballard, On the Integration of Grounding Language and Learning Objects, Proceedings of the 19th AAAI Conference on Artificial Intelligence (AAAI04), 2004.

C. S. Zettlemoyer, M. Zettlemoyer, and . Collins, Learning to Map sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars, Proceedings of Uncertainty in Artificial Intelligence (UAI05), 2005.

. Zhang, Text chunking based on a generalization of winnow, Journal of Machine Learning Research, vol.2, pp.615-637, 2002.

]. G. Zoutendijk, Methods of Feasible Directions, 1960.

S. Sonnenburg, V. Franc, E. Yom-tov, and M. Sebag, SGD-QN algorithm ranked 1 st ex-eaquo over 42 international competitors, 2007.

L. Sgdqn, Fast Optimizers for Linear SVMs, 2008.