. .. Linearly-separable-data,

.. .. Bias-variance-trade-off,

, Surrogate margin-based losses

.. .. Gradient-descent,

.. .. Frank-wolfe-optimization,

. .. , Similar points in the sense of the partition, p.28

. .. Example, , p.30

, Example of local models

. .. Example-of-landmarks,

, Illustration of the influence of the local models based on region distances 59

, Mean test loss on perceptual color distance estimation, p.70

. .. , Mean test loss on word similarity estimation, p.71

, Centralized vs Decentralized learning

, An extract of the synthetic dataset

, The objective value for Dada w.r.t. number of iterations, p.89

, Training and test accuracy w.r.t. number of iterations, p.90

. .. , Graph discovery evaluation with variable t w, p.91

. .. , Graph discovery evaluation with variable q, p.91

. .. , 3 Classification of 2D-XOR distribution, vol.106, p.205

. .. , 4 Classification of 2D-Swiss-roll distribution, p.107

, Mean test accuracies vs nb landmarks on UCI datasets, p.108

, Maximal mean accuracy vs nb clusters on UCI datasets, p.109

. .. , Comparison of landmark selection techniques, p.111

, Example of multi-view data

. .. Mvl-svm-method, , p.120

. .. , Average test accuracies on multi-view datasets, p.128

, Training and testing times on multi-view datasets, p.129

. .. , Average test accuracies w.r.t. the training time, p.129

. .. , Test accuracies on samples with missing views, p.130

A. , Mean accuracies versus percentage of labeled data, p.147

. .. , Mean accuracies versus percentage of noise, p.148

, Artificially induced label noise

, Artificially induced label noise

, Examples of minimal perturbations

, Classification boundaries and confidence levels on toy datasets, p.168

. .. , 172 B.4 Comparison of different defenses against white-box and black-box attacks on MNIST

, Comparison of different defenses against white-box and black-box attacks on CIFAR10

. .. E.1-;, Variable dependencies one SVM per cluster, p.188

E. ,

E. .. ;--svms, 188 List of Tables 1.1 Examples of l p -norms for vectors and matrices

%. ). , 112 5.4 Testing Accuracies (%) and Training Speedups w.r.t. RBF-SVM, p.112

A. , Mean accuracies and standard deviations on seven UCI datasets

, Accuracy (%) on MNIST for FGSM attack transfer on different architectures ( = 0.1, best result in bold, second best in gray), p.171

. .. , White-box attack on adversarial examples from FGSM with minimal perturbation (best result in bold, second best in gray), p.176

, Local loss sensitivity analysis for defenses on MNIST (best result in bold, second best in gray)

, Accuracy (%) for black-box attacks on MNIST (best result in bold, second best in gray)

, Accuracy (%) for black-box attacks on CIFAR10 (best result in bold, second best in gray)

, Decentralized Frank-Wolfe Graph Regularized Boosting, p.83

S. Noga-alon, N. Ben-david, D. Cesa-bianchi, and . Haussler, Scalesensitive dimensions, uniform convergence, and learnability, Journal of the ACM (JACM), vol.44, issue.4, pp.615-631, 1997.

M. Amini, N. Usunier, and C. Goutte, Learning from multiple partially observed views-an application to multilingual text categorization, Advances in neural information processing systems, pp.28-36, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01297947

D. Angluin and P. Laird, Learning from noisy examples, vol.2, pp.343-370, 1988.

Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, Advances in neural information processing systems, 2015.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American mathematical society, vol.68, issue.3, pp.337-404, 1950.

F. R. Bach, G. R. Lanckriet, and M. I. Jordan, Multiple kernel learning, conic duality, and the smo algorithm, Proceedings of the twenty-first international conference on Machine learning, 2004.

G. Bak?r, L. Bottou, and J. Weston, Breaking svm complexity with cross training, vol.17, pp.81-88, 2005.

M. F. Balcan, A. Blum, S. Fine, and Y. Mansour, Distributed learning, communication complexity and privacy, COLT, 2012.

M. Balcan, A. Blum, and N. Srebro, Improved guarantees for learning via similarity functions, Computer Science Department, p.126, 2008.

M. Balcan, A. Blum, and N. Srebro, A theory of learning with similarity functions, Machine Learning, vol.72, pp.89-112, 2008.

M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, Can machine learning be secure?, Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, ASIACCS '06, p.209, 2006.

I. M. Baytas, M. Yan, A. K. Jain, and J. Zhou, Asynchronous Multi-task Learning, ICDM, 2016.

R. Bekkerman, M. Bilenko, and J. Langford, Scaling up machine learning: Parallel and distributed approaches, 2011.

A. Bellet, R. Guerraoui, M. Taziki, and M. Tommasi, Personalized and Private Peer-to-Peer Machine Learning, In AISTATS, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01665422

A. Bellet and A. Habrard, Robustness and Generalization for Metric Learning, vol.151, pp.259-267, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01075370

A. Bellet, A. Habrard, and M. Sebban, A survey on metric learning for feature vectors and structured data, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01666935

A. Bellet, A. Habrard, and M. Sebban, Metric learning, vol.9, pp.1-151, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01121733

A. Bellet, Y. Liang, A. Bagheri-garakani, M. Balcan, and F. Sha, A distributed frank-wolfe algorithm for communicationefficient sparse learning, SDM, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01430851

S. Ben-david, D. Loker, N. Srebro, and K. Sridharan, Minimizing the misclassification error rate using a surrogate convex loss, Proceedings of the 29th International Conference on Machine Learning, ICML, 2012.

A. Bendale and T. E. Boult, Towards open set deep networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1563-1572, 2016.

S. Bhadra, S. Kaski, and J. Rousu, Multi-view kernel completion, vol.106, pp.713-739, 2017.

B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. ?rndi? et al., Evasion attacks against machine learning at test time, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.387-402, 2013.

B. Biggio, I. Corona, B. Nelson, I. P. Benjamin, D. Rubinstein et al., Security evaluation of support vector machines in adversarial environments, Support Vector Machines Applications, pp.105-153, 2014.

B. Biggio, G. Fumera, and F. Roli, Security evaluation of pattern classifiers under attack, IEEE Transactions on Knowledge and Data Engineering, vol.26, pp.984-996, 2014.

M. Bilenko, S. Basu, and R. Mooney, Integrating constraints and metric learning in semi-supervised clustering, Proceedings of the twenty-first international conference on Machine learning, p.11, 2004.

A. Blum and T. Mitchell, Combining labeled and unlabeled data with co-training, Proceedings of the eleventh annual conference on Computational learning theory, pp.92-100, 1998.

A. Bordes, L. Bottou, and P. Gallinari, Sgd-qn: Careful quasinewton stochastic gradient descent, vol.10, pp.1737-1754, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00750911

B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, pp.144-152, 1992.

S. Boucheron, G. Lugosi, and O. Bousquet, Concentration inequalities, Advanced Lectures on Machine Learning, pp.208-240
URL : https://hal.archives-ouvertes.fr/hal-00777381

. Springer, , 2004.

O. Bousquet and A. Elisseeff, Stability and generalization, JMLR. org, vol.2, pp.499-526, 2002.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, vol.3, pp.1-122, 2011.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

P. Stephen, A. Boyd, B. Ghosh, D. Prabhakar, and . Shah, Randomized gossip algorithms, vol.52, pp.2508-2530, 2006.

D. S. Broomhead and D. Lowe, Radial basis functions, multi-variable functional interpolation and adaptive networks, Royal Signals and Radar Establishment Malvern, 1988.

L. Bruzzone, M. Chi, and M. Marconcini, A novel transductive svm for semisupervised classification of remote-sensing images, vol.44, pp.3363-3373, 2006.

N. Carlini and D. Wagner, Defensive distillation is not robust to adversarial examples, 2016.

N. Carlini and D. Wagner, Adversarial examples are not easily detected: Bypassing ten detection methods, 2017.

N. Carlini and D. Wagner, Towards evaluating the robustness of neural networks, IEEE Symposium on Security and Privacy, 2017.

R. Caruana, Multitask learning. Machine learning, vol.28, pp.41-75, 1997.

C. Chang and C. Lin, Libsvm: a library for support vector machines, vol.2, p.27, 2011.

O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang et al., Multi-task learning for boosting with application to web search ranking, KDD, 2010.

M. Chen and L. Denoyer, Multi-view generative adversarial networks, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.175-188, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02101339

M. Chen, L. Denoyer, and T. Artières, Multi-view data generation without view supervision, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02101404

Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall et al., The ucr time series classification archive, 2015.

I. Colin, A. Bellet, J. Salmon, and S. Clémençon, Gossip dual averaging for decentralized optimization of pairwise functions, ICML, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02107511

M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, adaboost and bregman distances, vol.48, pp.253-285, 2002.

J. Cooper and L. Reyzin, Improved algorithms for distributed boosting, 2017.

C. Cortes and M. Mohri, Auc optimization vs. error rate minimization, Advances in neural information processing systems, pp.313-320, 2004.

C. Cortes and V. Vapnik, Support-vector networks, vol.20, pp.273-297, 1995.

T. Cover and P. Hart, Nearest neighbor pattern classification, vol.13, pp.21-27, 1967.

N. Dalvi, P. Domingos, S. Mausam, D. Sanghai, and . Verma, Adversarial classification, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), KDD '04, pp.99-108, 2004.

J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, Information-theoretic metric learning, Proceedings of the 24th international conference on Machine learning, pp.209-216, 2007.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009.

T. G. Dietterich, R. H. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, vol.89, pp.31-71, 1997.

A. Domahidi, E. Chu, and S. Boyd, Ecos: An socp solver for embedded systems, Control Conference (ECC), pp.3071-3076, 2013.

P. Drineas, W. Michael, and . Mahoney, On the nyström method for approximating a gram matrix for improved kernel-based learning, journal of machine learning research, vol.6, pp.2153-2175, 2005.

J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling, vol.57, pp.592-606, 2012.

, Cynthia Dwork. Differential Privacy. In ICALP, vol.2, 2006.

S. D?eroski and B. ?enko, Is combining classifiers with stacking better than selecting the best one? Machine learning, vol.54, pp.255-273, 2004.

K. Rong-en-fan, C. Chang, X. Hsieh, C. Wang, and . Lin, LIBLINEAR: A library for large linear classification, vol.9, pp.1871-1874, 2008.

J. Farquhar, D. Hardoon, H. Meng, J. S. Shawe-taylor, and S. Szedmak, Two view learning: Svm-2k, theory and practice, Advances in neural information processing systems, pp.355-362, 2006.

R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, Detecting adversarial samples from artifacts, 2017.

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum et al., Efficient and robust automated machine learning, Advances in Neural Information Processing Systems, pp.2962-2970, 2015.

M. Fornoni, B. Caputo, and F. Orabona, Multiclass latent locally linear support vector machines, ACML, pp.229-244, 2013.

M. Frank and P. Wolfe, An algorithm for quadratic programming, vol.3, pp.95-110, 1956.

M. Fréchet, Sur les fonctionnelles continues, Annales Scientifiques de L'Ecole Normale Superieure, vol.27, pp.193-216, 1910.

Y. Freund, R. Schapire, and N. Abe, A short introduction to boosting, Journal-Japanese Society For Artificial Intelligence, vol.14, p.1612, 1999.

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, vol.55, pp.119-139, 1997.

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, ICML, vol.96, pp.148-156, 1996.

J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, pp.1189-1232, 2001.

J. H. Friedman, Greedy function approximation: A gradient boosting machine, vol.29, pp.1189-1232, 2001.

A. Frome, Y. Singer, and J. Malik, Image retrieval and classification using local distance functions, Advances in neural information processing systems, pp.417-424, 2007.

Z. Fu, A. Robles-kelly, and J. Zhou, Mixing linear svms for nonlinear classification, vol.21, pp.1963-1975, 2010.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

M. Gönen and E. Alpayd?n, Multiple kernel learning algorithms, vol.12, pp.2211-2268, 2011.

Z. Gong, W. Wang, and W. Ku, Adversarial and clean data are not twins, 2017.

I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, 2014.

A. Goyal, E. Morvant, P. Germain, and M. Amini, Pacbayesian analysis for a two-step hierarchical multiview learning approach, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.205-221, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01546109

K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. D. Mcdaniel, On the (statistical) detection of adversarial examples, 2017.

Q. Gu and J. Han, Clustered support vector machines, AISTATS, pp.307-315, 2013.

N. Halko, J. A. Per-gunnar-martinsson, and . Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, vol.53, pp.217-288, 2011.

T. Hastie and R. Tibshirani, Discriminant adaptive nearest neighbor classification and regression, Advances in Neural Information Processing Systems, pp.409-415, 1996.

T. Hastie, R. Tibshirani, and J. Friedman, Unsupervised learning, 2009.

S. Hauberg, O. Freifeld, and M. J. Black, A geometric take on metric learning, Advances in Neural Information Processing Systems, pp.2024-2032, 2012.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2015.

X. Huang, M. Kwiatkowska, S. Wang, and M. Wu, Safety verification of deep neural networks, 2016.

Y. Huang, C. Li, M. Georgiopoulos, and C. Georgios, Anagnostopoulos. Reduced-rank local distance metric learning, Machine Learning and Knowledge Discovery in Databases, pp.224-239, 2013.

R. Huusari, H. Kadri, and C. Capponi, Multi-view metric learning in vector-valued kernel spaces, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp.415-424, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01736068

M. Jaggi, Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, ICML, 2013.

T. Joachims, Training linear svms in linear time, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pp.217-226, 2006.

A. Joulin and F. R. Bach, A convex relaxation for weakly supervised classifiers, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00717450

H. Kadri, S. Ayache, C. Capponi, S. Koço, F. Dupé et al., The multi-task learning view of multimodal data, Asian Conference on Machine Learning, pp.261-276, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01070601

P. Kar and P. Jain, Similarity-based learning via data driven embeddings, Advances in neural information processing systems, 1998.

M. Kearns and M. Li, Learning in the presence of malicious errors, vol.22, pp.807-837, 1993.

M. Kearns and Y. Mansour, On the boosting ability of top-down decision tree learning algorithms, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp.459-468, 1996.

M. J. Kearns, Umesh Virkumar Vazirani, and Umesh Vazirani. An introduction to computational learning theory, 1994.

V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning, High dimensional probability II, pp.443-457, 2000.

J. Kone?n?, H. B. Mcmahan, D. Ramage, and P. Richtárik, Federated Optimization: Distributed Machine Learning for On-Device Intelligence, 2016.

J. Kone?n?, H. B. Mcmahan, F. X. Yu, and P. Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency, 2016.

A. Krizhevsky, V. Nair, and G. Hinton, Cifar-10 (canadian institute for advanced research), 2009.

D. Krueger, N. Ballas, S. Jastrzebski, D. Arpit, M. S. Kanwal et al., A closer look at memorization in deep networks, International Conference on Machine Learning (ICML), 2017.

B. Kulis, Metric learning: A survey, vol.5, pp.287-364, 2012.

A. Kurakin, I. J. Goodfellow, and S. Bengio, Adversarial examples in the physical world, 2016.

M. Kusner, S. Tyree, Q. Kilian, and . Weinberger, Stochastic neighbor compression, Proceedings of the 31st international conference on machine learning (ICML-14), pp.622-630, 2014.

S. Lacoste, -. , and M. Jaggi, On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, pp.496-504, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248675

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Block-Coordinate Frank-Wolfe Optimization for Structural SVMs, ICML, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720158

L. Ladicky and P. Torr, Locally linear support vector machines, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.985-992, 2011.

J. Lafond, H. Wai, and E. Moulines, D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization, ICASSP, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01419048

R. Lajugie, D. Garreau, F. R. Bach, and S. Arlot, Metric learning for temporal sequence alignment, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.1817-1825, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01062130

A. Lazarevic and Z. Obradovic, The distributed boosting algorithm, KDD, 2001.

R. Lebret and R. Collobert, Word emdeddings through hellinger pca, 2013.

Y. Lecun and Y. Bengio, The handbook of brain theory and neural networks, chapter Convolutional Networks for Images, Speech, and Time Series, pp.255-258, 1998.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature Research, vol.521, pp.436-444, 2015.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, vol.86, pp.2278-2324, 1998.

Y. Lecun and C. Cortes, MNIST handwritten digit database, 2010.

J. Lehman, J. Clune, D. Misevic, C. Adami, J. Beaulieu et al., The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01735473

J. Li, T. Arai, Y. Baba, H. Kashima, and S. Miwa, Distributed Multi-task Learning for Sensor Network, ECML/PKDD, 2017.

Y. Li, I. W. Tsang, J. T. Kwok, and Z. Zhou, Convex and scalable weakly labeled svms, JMLR. org, vol.14, pp.2151-2188, 2013.

X. Lian, C. Zhang, H. Zhang, C. Hsieh, W. Zhang et al., Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent, NIPS, 2017.

M. Lichman, UCI machine learning repository, 2013.

M. Shan-sung-liew, R. Khalil-hani, and . Bakhteri, Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems, vol.216, pp.718-734, 2016.

L. T. Liu, S. Dean, E. Rolf, M. Simchowitz, and M. Hardt, Delayed impact of fair machine learning. ICML, 2018.

Q. Liu and D. Wang, Stein variational gradient descent: A general purpose bayesian inference algorithm, Advances In Neural Information Processing Systems, pp.2370-2378, 2016.

S. Lloyd, Least squares quantization in pcm, IEEE transactions on information theory, vol.28, issue.2, pp.129-137, 1982.

H. Lynn and . Loomis, Introduction to abstract harmonic analysis. Courier Corporation, 2013.

D. Lowd and C. Meek, Adversarial learning, ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD), KDD '05, pp.641-647, 2005.

C. Ma, J. Kone?n?, M. Jaggi, V. Smith, I. Michael et al., Distributed optimization with arbitrary local solvers, vol.32, pp.813-848, 2017.

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, Towards deep learning models resistant to adversarial attacks, 2017.

Y. Mansour and M. Schain, Robust domain adaptation, vol.71, pp.365-380, 2014.

L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean, Boosting algorithms as gradient descent, Advances in neural information processing systems, pp.512-518, 2000.

H. , B. Mcmahan, E. Moore, D. Ramage, S. Hampson et al., Communication-efficient learning of deep networks from decentralized data, AISTATS, 2017.

J. Hendrik-metzen, T. Genewein, V. Fischer, and B. Bischoff, On detecting adversarial perturbations, 2017.

A. Charles, M. Micchelli, and . Pontil, On learning vector-valued functions, vol.17, pp.177-204, 2005.

L. Ha-quang-minh, V. Bazzani, and . Murino, A unifying framework for vector-valued manifold regularization and multi-view learning, ICML (2), pp.100-108, 2013.

L. Ha-quang-minh, V. Bazzani, and . Murino, A unifying framework in vector-valued reproducing kernel hilbert spaces for manifold regularization and co-regularized multi-view learning, vol.17, pp.1-72, 2016.

T. Miyato, . Shin-ichi, S. Maeda, M. Ishii, and . Koyama, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE transactions, 2018.

A. Seyed-mohsen-moosavi-dezfooli, O. Fawzi, P. Fawzi, and . Frossard, Universal adversarial perturbations, 2016.

A. Seyed-mohsen-moosavi-dezfooli, O. Fawzi, P. Fawzi, S. Frossard, and . Soatto, Analysis of universal adversarial perturbations, 2017.

A. Seyed-mohsen-moosavi-dezfooli, P. Fawzi, and . Frossard, Deepfool: a simple and accurate method to fool deep neural networks, 2015.

E. Morvant, A. Habrard, and S. Ayache, Parsimonious unsupervised and semi-supervised domain adaptation with good similarity functions, vol.33, pp.309-349, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00686205

N. Natarajan, S. Inderjit, P. K. Dhillon, A. Ravikumar, and . Tewari, Learning with noisy labels, Advances in neural information processing systems, pp.1196-1204, 2013.

A. Nedic and A. E. Ozdaglar, Distributed Subgradient Methods for Multi-Agent Optimization, vol.54, pp.48-61, 2009.

A. Mai-nguyen, J. Yosinski, and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, 2014.

M. Nicolae, M. Sebban, A. Habrard, É. Gaussier, and M. Amini, Algorithmic robustness for semi-supervised ( , ?, ? )-good metric learning, 2014.

M. Nicolae, M. Sinn, M. N. Tran, A. Rawat, M. Wistuba et al., Adversarial robustness toolbox v0, 2018.

R. Nock and F. Nielsen, Bregman divergences and surrogates for learning, vol.31, pp.2048-2059, 2009.

Y. Noh, B. Zhang, and D. D. Lee, Generative local metric learning for nearest neighbor classification, Advances in Neural Information Processing Systems, pp.1822-1830, 2010.

E. Brendan-o'donoghue, N. Chu, S. Parikh, and . Boyd, Conic optimization via operator splitting and homogeneous self-dual embedding, 2015.

N. Papernot, P. Mcdaniel, I. Goodfellow, S. Jha, Z. Berkay-celik et al., Practical black-box attacks against machine learning, Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS '17, pp.506-519, 2017.

N. Papernot and P. D. Mcdaniel, On the effectiveness of defensive distillation, 2016.

N. Papernot, P. D. Mcdaniel, S. Jha, M. Fredrikson, Z. Berkay-celik et al., The limitations of deep learning in adversarial settings, 2015.

N. Papernot, P. D. Mcdaniel, X. Wu, S. Jha, and A. Swami, Distillation as a defense to adversarial perturbations against deep neural networks, 2015.

G. Patrini, F. Nielsen, R. Nock, and M. Carioni, Loss factorization, weakly supervised learning and label noise robustness, 2016.

G. Patrini, R. Nock, T. Caetano, and P. Rivera, almost) no label no cry, Advances in Neural Information Processing Systems, pp.190-198, 2014.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

M. Perrot, A. Habrard, D. Muselet, and M. Sebban, Modeling perceptual color differences by local metric learning, European conference on computer vision, pp.96-111, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01009610

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information processing systems, pp.1177-1184, 2008.

C. E. Rasmussen, K. I. Christopher, and . Williams, Gaussian processes for machine learning, vol.38, pp.715-719, 2006.

L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, Are loss functions all the same?, vol.16, pp.1063-1076, 2004.

A. Rosenfeld and J. K. Tsotsos, Intriguing properties of randomly weighted networks: Generalizing while learning next to nothing, 2018.

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

R. E. Schapire, Y. Freund, P. Bartlett, and W. Sun-lee, Boosting the margin: A new explanation for the effectiveness of voting methods. The annals of statistics, vol.26, pp.1651-1686, 1998.

B. Schölkopf, R. Herbrich, and A. J. Smola, A generalized representer theorem, International conference on computational learning theory, pp.416-426, 2001.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, vol.10, issue.5, pp.1299-1319, 1998.

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2002.

O. Shamir and N. Srebro, Distributed Stochastic Optimization and Learning, 2014.

G. Sharma, W. Wu, and E. N. Dalal, The ciede2000 colordifference formula: Implementation notes, supplementary test data, and mathematical observations, vol.30, pp.21-30, 1976.

C. Shen and H. Li, On the dual formulation of boosting algorithms, vol.32, pp.2216-2231, 2010.

S. Victor, F. Sheng, P. G. Provost, and . Ipeirotis, Get another label? improving data quality and data mining using multiple, noisy labelers, Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.614-622, 2008.

R. Shwartz, -. , and N. Tishby, Opening the black box of deep neural networks via information, 2017.

V. Smith, C. Chiang, M. Sanjabi, and A. S. Talwalkar, Federated Multi-Task Learning, NIPS, 2017.

Y. Song and Z. Lu, Cane Wing-ki Leung, and Qiang Yang. Collaborative boosting for activity classification in microblogs, KDD, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, JMLR.org, vol.15, pp.1929-1958, 2014.

I. Steinwart, Sparseness of support vector machines, vol.4, pp.1071-1105, 2003.

, Multi-view laplacian support vector machines, International Conference on Advanced Data Mining and Applications, pp.209-222

. Springer, , 2011.

S. Sun, A survey of multi-view machine learning, vol.23, pp.2031-2038, 2013.

. Ananda-theertha, F. X. Suresh, S. Yu, H. B. Kumar, and . Mcmahan, Distributed Mean Estimation with Limited Communication, 2017.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2011.

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan et al., Intriguing properties of neural networks, 2013.

H. Tang, X. Lian, M. Yan, C. Zhang, and J. Liu, D 2 : Decentralized training over decentralized data, 2018.

H. Tang, C. Zhang, S. Gan, T. Zhang, and J. Liu, Decentralization Meets Quantization, 2018.

J. Tang, Y. Tian, P. Zhang, and X. Liu, Multiview privileged support vector machines, 2017.

C. Thornton, F. Hutter, H. Holger, K. Hoos, and . Leyton-brown, Auto-weka: Combined selection and hyperparameter optimization of classification algorithms, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.847-855, 2013.

M. Tkalcic, F. Jurij, and . Tasic, Colour spaces: perceptual, historical and applicational background, Eurocon, pp.304-308, 2003.

F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. Mcdaniel, Ensemble adversarial training: Attacks and defenses, 2017.

A. Trivedi, P. Rai, H. Daumé, I. , and S. L. Duvall, Multiview clustering with incomplete views, NIPS Workshop, 2010.

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, There is no free lunch in adversarial robustness, 2018.

L. G. Valiant, A theory of the learnable, Communications of the ACM, vol.27, issue.11, pp.1134-1142, 1984.

W. Aad, . Van-der, J. A. Vaart, and . Wellner, Weak convergence, Weak Convergence and Empirical Processes, pp.16-28, 1996.

J. C. Van-gemert, J. Geusebroek, C. J. Veenman, and A. W. Smeulders, Kernel codebooks for scene categorization, European conference on computer vision, pp.696-709, 2008.

P. Vanhaesebrouck, A. Bellet, and M. Tommasi, Decentralized Collaborative Learning of Personalized Models over Networks, AISTATS, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01533182

V. Vapnik, Estimation of dependences based on empirical data, 2006.

N. Vladimir, A. Y. Vapnik, and . Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Measures of complexity, pp.11-30, 2015.

M. Varma, Local deep kernel learning for efficient non-linear svm prediction, International Conference on Machine Learning, pp.486-494, 2013.

C. Wang, Y. Wang, and R. Schapire, Functional Frank-Wolfe Boosting for General Loss Functions, 2015.

J. Wang, M. Kolar, and N. Srebro, Distributed Multi-Task Learning with Shared Representation, 2016.

J. Wang, M. Kolar, and N. Srebro, Distributed Multitask Learning, AISTATS, 2016.

J. Wang and V. Saligrama, Local supervised learning through space partitioning, Advances in Neural Information Processing Systems, pp.91-99, 2012.

J. Wang, A. Kalousis, and A. Woznica, Parametric local metric learning for nearest neighbor classification, Advances in Neural Information Processing Systems, pp.1601-1609, 2012.

S. Wang and C. Zhang, Network game and boosting, ECML, 2005.

D. Warde, -. Farley, and I. Goodfellow, Adversarial perturbations of deep neural networks, Perturbation, Optimization, and Statistics, 2016.

E. Wei and A. E. Ozdaglar, Distributed Alternating Direction Method of Multipliers, CDC, 2012.

Q. Kilian, J. Weinberger, L. K. Blitzer, and . Saul, Distance metric learning for large margin nearest neighbor classification, Advances in neural information processing systems, pp.1473-1480, 2005.

Q. Kilian, L. K. Weinberger, and . Saul, Distance metric learning for large margin nearest neighbor classification, JMLR. org, vol.10, pp.207-244, 2009.

H. Tsui-wei-weng, P. Zhang, J. Chen, D. Yi, Y. Su et al., Evaluating the robustness of neural networks: An extreme value theory approach, 2018.

K. I. Christopher, M. Williams, and . Seeger, Using the nyström method to speed up kernel machines, Advances in neural information processing systems, pp.682-688, 2001.

E. P. Xing, M. I. Jordan, S. Russell, and A. Y. Ng, Distance metric learning with application to clustering with side-information, Advances in neural information processing systems, pp.505-512, 2002.

C. Xu, D. Tao, and C. Xu, A survey on multi-view learning, 2013.

H. Xu and S. Mannor, Robustness and generalization, vol.86, pp.391-423, 2012.

W. Xu, D. Evans, and Y. Qi, Feature squeezing: Detecting adversarial examples in deep neural networks, 2017.

W. Xu, D. Evans, and Y. Qi, Feature squeezing mitigates and detects carlini/wagner adversarial examples, 2017.

L. Yang and R. Jin, Distance metric learning: A comprehensive survey, vol.2, 2006.

K. Yu, T. Zhang, and Y. Gong, Nonlinear learning using local coordinate coding, Advances in Neural Information Processing Systems, vol.22, pp.2223-2231, 2009.

E. Yudkowsky, Artificial intelligence as a positive and negative factor in global risk, Global catastrophic risks, vol.1, issue.303, p.184, 2008.

V. Zantedeschi, A. Bellet, and M. Tommasi, Decentralized Frank-Wolfe boosting for collaborative learning of personalized models, CAp, 2018.

V. Zantedeschi, R. Emonet, and M. Sebban, Apprentissage de combinaisons convexes de métriques locales avec garanties de généralisation, 2016.

V. Zantedeschi, R. Emonet, and M. Sebban, Beta-risk: a new surrogate risk for learning from weakly labeled data, Advances in Neural Information Processing Systems, pp.4365-4373, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01359298

V. Zantedeschi, R. Emonet, and M. Sebban, Lipschitz continuity of mahalanobis distances and bilinear forms, 2016.

V. Zantedeschi, R. Emonet, and M. Sebban, Metric learning as convex combinations of local models with generalization guarantees, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1478-1486, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01323567

V. Zantedeschi, R. Emonet, and M. Sebban, L 3 -svms: Landmarkbased linear local support vector machines, CAp, 2017.

V. Zantedeschi, R. Emonet, and M. Sebban, Fast and provably effective multi-view classification with landmark-based svm, ECML PKDD, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01863895

V. Zantedeschi, M. Nicolae, and A. Rawat, Efficient defenses against adversarial attacks, Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp.39-49, 2017.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

C. Zhang, M. Ahmad, and Y. Wang, Admm based privacypreserving decentralized optimization, 2018.

Y. Zhang, J. C. Duchi, M. I. Jordan, and M. J. Wainwright, Information-theoretic lower bounds for distributed statistical estimation with communication constraints, NIPS, 2013.

Y. Zhang, M. J. Wainwright, and J. C. Duchi, Communicationefficient algorithms for statistical optimization, NIPS, 2012.

J. Zhao, X. Xie, X. Xu, and S. Sun, Multi-view learning overview: Recent progress and new challenges, vol.38, pp.43-54, 2017.

X. Zhou, N. Cui, Z. Li, F. Liang, and T. S. Huang, Hierarchical gaussianization for image classification, IEEE 12th International Conference on, pp.1971-1977, 2009.

X. Zhu, Semi-supervised learning literature survey, Computer Sciences, 2005.