D. Achlioptas, Database-friendly random projections, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , PODS '01, pp.274-281, 2001.
DOI : 10.1145/375551.375608

URL : http://www.research.microsoft.com/~optas/papers/jl.ps

C. Andrieu and A. Doucet, Online expectation-maximization type algorithms for parameter estimation in general state space models, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1201620

P. Ahrendt, The Multivariate Gaussian Probability Distribution, 2005.

[. Fernández, M. D. , J. Gamero, and J. M. García, A test for the two-sample problem based on empirical characteristic functions, Computational Statistics and Data Analysis, vol.527, pp.3730-3748, 2008.

[. Ailon, R. Jaiswal, and C. Monteleoni, Streaming k -means approximation, Advances in Neural Information Processing Systems (NIPS), pp.10-18, 2009.

D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, vol.27, issue.2, pp.245-248, 2009.
DOI : 10.1007/s10994-009-5103-0

D. Achlioptas and F. Mcsherry, On Spectral Learning of Mixtures of Distributions, In: Learning Theory, pp.458-469, 2005.
DOI : 10.1007/11503415_31

J. Anderson, M. Belkin, N. Goyal, L. Rademacher, and J. Voss, The more, the merrier: the blessing of dimensionality for learning large gaussian mixtures, pp.1-30, 2013.

A. Aronzajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-404, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

J. Salem-alelyani, H. Tang, and . Liu, Feature Selection for Clustering : A Review, Data Clustering: Algorithms and Applications (2016), pp.29-60

D. Arthur and S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, ACM-SIAM symposium on Discrete algorithms, pp.1027-1035, 2007.

F. Bach, On the Equivalence between Quadrature Rules and Random Features, pp.1-25, 2015.

[. Baraniuk, M. Davenport, A. Ronald, M. Devore, and . Wakin, A Simple Proof of the Restricted Isometry Property for Random Matrices, Constructive Approximation, vol.159, issue.2, pp.253-263, 2008.
DOI : 10.1007/978-3-642-60932-9

R. Baraniuk, Compressive sensing, 2008 42nd Annual Conference on Information Sciences and Systems, pp.118-121, 2007.
DOI : 10.1109/CISS.2008.4558479

URL : https://hal.archives-ouvertes.fr/hal-00452261

T. Blumensath and M. E. Davies, Gradient pursuit for non-linear sparse signal modelling, European Signal Processing Conference (EUSIPCO), 2008.

T. Blumensath and M. E. Davies, Iterative Thresholding for Sparse Approximations, Journal of Fourier Analysis and Applications, vol.73, issue.10, pp.5-6, 2008.
DOI : 10.1017/CBO9780511810817

T. Blumensath and M. E. Davies, Iterative hard thresholding for compressed sensing, Applied and Computational Harmonic Analysis, vol.27, issue.3, pp.265-274, 2009.
DOI : 10.1016/j.acha.2009.04.002

URL : https://doi.org/10.1016/j.acha.2009.04.002

S. Becker, In: https://github, 2013.

A. Bourrier, R. Gribonval, and P. Pérez, Compressive gaussian mixture estimation, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). 2013, pp.6024-6028
DOI : 10.1109/icassp.2013.6638821

URL : https://hal.archives-ouvertes.fr/hal-00799896

A. Bourrier, R. Gribonval, and P. Pérez, Compressive Gaussian Mixture estimation " . In: Compressed Sensing and its Applications -MATHEON Workshop 2013, pp.6024-6028, 2015.
DOI : 10.1109/icassp.2013.6638821

URL : https://hal.archives-ouvertes.fr/hal-00799896

[. Bassiou, C. Kotropoulos, and E. Koliopoulou, Symmetric alpha-Stable Sparse Linear Regression for Musical Audio Denoising, pp.375-380, 2013.
DOI : 10.1109/ispa.2013.6703771

[. Bach, G. R. Lanckriet, and M. I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015424

URL : http://www.cs.berkeley.edu/~jordan/papers/skm_icml.pdf

[. Blumensath, Sampling and Reconstructing Signals From a Union of Linear Subspaces, IEEE Transactions on Information Theory, vol.57, issue.7, pp.4660-4671, 2011.
DOI : 10.1109/TIT.2011.2146550

K. M. Borgwardt, A. Gretton, M. J. Rasch, H. P. Kriegel, B. Schölkopf et al., Integrating structured biological data by Kernel Maximum Mean Discrepancy, Bioinformatics, vol.22, issue.14, pp.49-57, 2006.
DOI : 10.1093/bioinformatics/btl242

URL : https://academic.oup.com/bioinformatics/article-pdf/22/14/e49/616383/btl242.pdf

A. Bourrier, M. E. Davies, T. Peleg, and R. Gribonval, Fundamental Performance Limits for Ideal Decoders in High-Dimensional Linear Inverse Problems, IEEE Transactions on Information Theory, vol.60, issue.12, pp.7928-7946, 2014.
DOI : 10.1109/TIT.2014.2364403

URL : https://hal.archives-ouvertes.fr/hal-00908358

A. Bourrier, Compressed sensing and dimensionality reduction for unsupervised learning
URL : https://hal.archives-ouvertes.fr/tel-01023030

K. Bredies and H. K. Pikkarainen, Inverse problems in spaces of measures, ESAIM: Control, Optimisation and Calculus of Variations, vol.19, issue.1, pp.190-218, 2012.
DOI : 10.1051/cocv/2011205

[. Bashan, R. Raich, and A. O. Hero, Optimal Two-Stage Search for Sparse Targets Using Convex Criteria, IEEE Transactions on Signal Processing, vol.56, issue.11, pp.5389-5402, 2008.
DOI : 10.1109/TSP.2008.929114

URL : https://www.eecs.umich.edu/~hero/Preprints/OptimalTwo-Stage.pdf

T. Petros, S. Boufounos, H. Rane, and . Mansour, Representation and Coding of Signal Geometry, pp.1-28, 2015.

. Steude, Stable mixture GARCH models, Journal of Econometrics, vol.1722, pp.292-306, 2013.

[. Brooks, VOICEBOX: Speech Processing Toolbox for MATLAB, 2005.

L. Bo and C. Sminchisescu, Efficient Match Kernels between Sets of Features for Visual Recognition, Advances in Neural Information Processing System, 2009.

M. Belkin and K. Sinha, Polynomial learning of distribution families, IEEE 51st Annual Symposium on Foundations of Computer Science. Ieee, 2010.
DOI : 10.1137/13090818x

URL : http://www.cse.ohio-state.edu/%7Esinhak/PLDF_FOCS_10.pdf

M. Belkin and K. Sinha, Toward Learning Gaussian Mixtures with Arbitrary Separation, Conference On Learning Theory (COLT). 2010. arXiv

[. Boyd, G. Schiebinger, and B. Recht, The Alternating Descent Conditional Gradient Method for Sparse Inverse Problems, pp.1-21, 2015.
DOI : 10.1109/camsap.2015.7383735

URL : http://arxiv.org/pdf/1507.01562

A. Berlinet and C. Thomas-agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, 2004.
DOI : 10.1007/978-1-4419-9096-9

[. Bunea, A. B. Tsybakov, M. H. Wegkamp, and A. Barbu, SPADES and mixture models, The Annals of Statistics, pp.2525-2558, 2010.
DOI : 10.1214/09-AOS790

URL : https://hal.archives-ouvertes.fr/hal-00514124

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

H. Richard, P. Byrd, J. Lu, C. Nocedal, and . Zhu, A Limited Memory Algorithm for Bound Constrained Optimization, In: SIAM Journal on Scientific Computing, vol.165, pp.1190-1208, 1995.

[. Boutsidis, A. Zouzias, and P. Drineas, Random Projections for kmeans Clustering, Advances in Neural Information and Processing Systems (NIPS). 2010, pp.298-306

E. J. Candès, The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, vol.346, issue.9-10, pp.589-592, 2008.
DOI : 10.1016/j.crma.2008.03.014

R. Casarin, Bayesian Inference for Mixtures of Stable Distributions, SSRN Electronic Journal, pp.1-50, 2004.
DOI : 10.2139/ssrn.739791

A. Cohen, W. Dahmen, A. Ronald, and . Devore, Compressed sensing and best $k$-term approximation, Journal of the American Mathematical Society, vol.22, issue.1, pp.211-231, 2009.
DOI : 10.1090/S0894-0347-08-00610-3

URL : http://www.igpm.rwth-aachen.de/Download/reports/pdf/IGPM260.pdf

M. Carrasco and J. Florens, GENERALIZATION OF GMM TO A CONTINUUM OF MOMENT CONDITIONS, Econometric Theory, vol.16, issue.6, 2000.
DOI : 10.1017/S0266466600166010

M. Carrasco and J. Florens, Efficient GMM estimation using the empirical characteristic function, 2002.
DOI : 10.2139/ssrn.1470423

URL : http://www.cirano.qc.ca/pdf/publication/2013s-22.pdf

M. Carrasco and J. Florens, ON THE ASYMPTOTIC EFFICIENCY OF GMM, Econometric Theory, vol.14, issue.02, pp.2-372, 2014.
DOI : 10.2307/1913241

J. Emmanuel, C. Candès, and . Fernandez-granda, Super-Resolution from Noisy Data, pp.1-22, 2012.

A. David, Z. Cohn, M. I. Ghahramani, and . Jordan, Active Learning with Statistical Models, Journal of Artificial Intelligence Research, vol.4, pp.129-145, 1996.

G. Cormode and M. Hadjieleftheriou, Methods for finding frequent items in data streams, The VLDB Journal, vol.15, issue.5, pp.3-20, 2009.
DOI : 10.1080/15427951.2004.10129079

URL : http://www.research.att.com/people/Cormode_Graham/library/publications/CormodeHadjieleftheriou09b.pdf

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, The Convex Geometry of Linear Inverse Problems, Foundations of Computational Mathematics, vol.1, issue.10, 2012.
DOI : 10.1007/978-1-4613-8431-1

O. Chabiron, F. Malgouyres, J. Y. Tourneret, and N. Dobigeon, Toward Fast Transform Learning, International Journal of Computer Vision, vol.60, issue.12, pp.2-3, 2015.
DOI : 10.1109/TSP.2012.2218241

URL : https://hal.archives-ouvertes.fr/hal-00862903

A. Chatalic, Towards Scalable Sketched Learning

K. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton, Fast Two-Sample Testing with Analytic Representations of Probability Measures, Advances in Neural Information Processing Systems (NIPS). 2015

[. Chitta, R. Jin, and . Jain, Efficient Kernel Clustering Using Random Fourier Features, 2012 IEEE 12th International Conference on Data Mining, pp.161-170, 2012.
DOI : 10.1109/ICDM.2012.61

URL : http://biometrics.cse.msu.edu/Publications/Clustering/ChittaJinJain_EfficientKernelClusteringUsingRandomFourierFeatures_ICDM12.pdf

[. Calderbank, S. Jafarpour, and R. Schapire, Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain, 2009.

G. Cormode and S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, Journal of Algorithms, vol.551, pp.58-75, 2005.
DOI : 10.1007/978-3-540-24698-5_7

URL : http://dimacs.rutgers.edu/~graham/pubs/papers/cm-latin.pdf

O. Cappé and E. Moulines, Online EM Algorithm for Latent Data Models, Journal of the Royal Statistical Society, vol.713, pp.593-613, 2009.

M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu, Dimensionality Reduction for k-Means Clustering and Low Rank Approximation, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC '15, p.37
DOI : 10.1093/qmath/11.1.50

URL : http://arxiv.org/pdf/1410.6801.pdf

G. Cormode, F. Korn, S. Muthukrishnan, and S. Divesh, Diamond in the rough, Proceedings of the 2004 ACM SIGMOD international conference on Management of data , SIGMOD '04, pp.155-166, 2004.
DOI : 10.1145/1007568.1007588

G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, Foundations and Trends in Databases 4.xx, pp.1-294, 2011.
DOI : 10.1561/1900000004

J. Emmanuel, Y. Candès, and . Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements, IEEE Transactions on Information Theory, vol.574, pp.2342-2359, 2011.

E. J. Candès, J. K. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol.52, issue.2, pp.480-509, 2006.
DOI : 10.1109/TIT.2005.862083

E. J. Candès, J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, vol.7, issue.8, pp.1207-1223, 2006.
DOI : 10.1017/CBO9780511804441

K. Choromanski, M. Rowland, and A. Weller, The Unreasonable Effectiveness of Random Orthogonal Embeddings, 2017.

F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin of the American Mathematical Society, vol.39, issue.01, pp.1-49, 2002.
DOI : 10.1090/S0273-0979-01-00923-5

R. [. Chandra and . Sharma, Fast learning in Deep Neural Networks, Neurocomputing, vol.171, pp.1205-1215, 2016.
DOI : 10.1016/j.neucom.2015.07.093

J. Emmanuel, T. Candès, and . Tao, Decoding by linear programming, IEEE Transactions on Information Theory, vol.5112, pp.4203-4215, 2004.

J. Emmanuel, T. Candès, and . Tao, Near-optimal signal recovery from random projections: Universal encoding strategies, IEEE Transactions on Information Theory, vol.5212, pp.5406-5425, 2006.

J. Emmanuel, T. Candès, and . Tao, The power of convex relaxation: Nearoptimal matrix completion, IEEE Transactions on Information Theory, vol.565, pp.2053-2080, 2010.

[. Choromanski and S. Vikas, Recycling Randomness with Structure for Sublinear time Kernel Expansions, 2016.

A. Dasgupta, J. Hopcroft, J. Kleinberg, and M. Sandler, On Learning Mixtures of Heavy-Tailed Distributions, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05), pp.491-500, 2005.
DOI : 10.1109/SFCS.2005.56

S. Dasgupta, Learning mixtures of Gaussians, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039), 1999.
DOI : 10.1109/SFFCS.1999.814639

Y. Decastro, F. Gamboa, D. Henrion, and J. Lasserre, Exact solutions to Super Resolution on semi-algebraic domains in higher dimensions, pp.1-22, 2015.

A. Deleforge, F. Forbes, and R. Horaud, High-dimensional regression with gaussian mixtures and partially-latent response variables, Statistics and Computing, vol.19, issue.11, pp.893-911, 2014.
DOI : 10.1109/TNN.2008.2003467

URL : https://hal.archives-ouvertes.fr/hal-01107604

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, In: JOURNAL OF THE ROYAL STA- TISTICAL SOCIETY, SERIES B, vol.391, pp.1-38, 1977.

L. David and . Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol.524, pp.1289-1306, 2006.

L. David and . Donoho, 50 years of Data Science, pp.1-41, 2015.

V. Duval and G. Peyré, Exact Support Recovery for Sparse Spikes Deconvolution, Foundations of Computational Mathematics, vol.15, issue.5, pp.1315-1355, 2015.
DOI : 10.1109/TIT.2012.2233859

URL : https://hal.archives-ouvertes.fr/hal-00839635

J. Duchi, Derivations for Linear Algebra and Optimization, 2007.

D. Feldman, M. Monemizadeh, C. Sohler, P. David, and . Woodruff, Coresets and Sketches for High Dimensional Subspace Approximation Problems, pp.630-649, 2010.
DOI : 10.1137/1.9781611973075.53

URL : http://www.almaden.ibm.com/cs/people/dpwoodru/fmsw10.pdf

[. Feldman, M. Faulkner, and A. Krause, Scalable Training of Mixture Models via Coresets, Proceedings of Neural Information Processing Systems, pp.1-9, 2011.

A. Alexei, P. Fedotov, F. Harremoës, and . Topsøe, Refinements of Pinsker's Inequality, IEEE Transactions on Information Theory, vol.496, pp.1491-1498, 2003.

T. Fischer, Existence, uniqueness, and minimality of the Jordan measure decomposition, 2012.

D. Feldman and M. Langberg, A unified framework for approximating and clustering data, Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC '11, pp.569-578, 2011.
DOI : 10.1145/1993636.1993712

URL : http://people.csail.mit.edu/dannyf/stoc11.pdf

[. Fradkin and D. Madigan, Experiments with random projections for machine learning, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, pp.517-522, 2003.
DOI : 10.1145/956750.956812

URL : http://dimacspc6.rutgers.edu/~dfradkin/papers/rp.pdf

A. Feuerverger and R. Mureika, The Empirical Characteristic Function and Its Applications, The annals of Statistics, 1977.
DOI : 10.1214/aos/1176343742

A. Feuerverger and P. Mcdunnough, On Some Fourier Methods for Inference, Journal of the American Statistical Association, vol.39, issue.374, pp.379-387, 1981.
DOI : 10.1007/BF02024507

S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing Applied and Numerical Harmonic Analysis, 2013.
DOI : 10.1007/978-0-8176-4948-7

G. Frahling and C. Sohler, A fast k -means implementation using coresets, Proceedings of the twenty-second annual symposium on Computational geometry (SoCG), pp.605-625, 2005.
DOI : 10.1145/1137856.1137879

URL : http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/33320.pdf

]. K. Fuk+07, A. Fukumizu, X. Gretton, B. Sun, and . Schölkopf, Kernel Measures of Conditional Dependence, Advances in Neural Information Processing System (NIPS), 2007.

A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss, How to Summarize the Universe, International Conference on Very Large Data Bases (VLBD), pp.454-465, 2002.
DOI : 10.1016/B978-155860869-6/50047-0

S. Godsill and E. E. Kuruoglu, Bayesian inference for time series with heavy-tailed symmetric alpha -stable noise processes, pp.1-28, 1999.

[. Ghesmoune, M. Lebbah, and H. Azzag, State-of-the-art on clustering data streams, Big Data Analytics, vol.4, issue.1???3, p.13, 2016.
DOI : 10.1201/EBK1439826119

[. Ghashami, D. Perry, and J. M. Phillips, Streaming Kernel Principal Component Analysis, International Conference on Artificial Intelligence and Statistics, pp.1-16, 2016.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola, A Kernel Method for the Two-Sample Problem, Advances in Neural Information Processing Systems (NIPS), 2006.

A. Gretton, K. Bharath, D. Sriperumbudur, H. Sejdinovic, M. Strathmann et al., Optimal kernel choice for large-scale two-sample tests, Advances in Neural Information Processing Systems (NIPS) (2012), pp.1214-1222

R. Gribonval, G. Blanchard, N. Keriven, and Y. Traonmilin, Compressive Statistical Learning with Random Feature Moments, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01544609

R. Gribonval, E. Bacry, S. Mallat, P. Depalle, and X. Rodet, Analysis of sound signals with high resolution matching pursuit, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96), pp.125-128, 1996.
DOI : 10.1109/TFSA.1996.546702

URL : https://hal.archives-ouvertes.fr/inria-00576196

[. Giryes, G. Sapiro, and A. M. Bronstein, Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?, IEEE Transactions on Signal Processing, vol.64, issue.13, 2015.
DOI : 10.1109/TSP.2016.2546221

URL : http://arxiv.org/pdf/1504.08291

S. Guha, N. Mishra, R. Motwani, and L. Ocallaghan, Clustering Data Streams, Proc. Ann. Symp. Foundations of Computer Science, 2000.

A. R. Hall, Generalized method of moments, 2005.

J. Haupt, R. Castro, R. Nowak, and S. May, Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation, IEEE Transactions on Information Theory, vol.57, issue.9, pp.6222-6235, 2011.
DOI : 10.1109/TIT.2011.2162269

URL : http://nowak.ece.wisc.edu/DS_arxiv.pdf

D. Hsu and S. M. Kakade, Learning mixtures of spherical gaussians, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ITCS '13
DOI : 10.1145/2422436.2422439

S. Har-peled and S. Mazumdar, Coresets for k-Means and k-Median Clustering and their Applications, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp.291-300, 2004.
DOI : 10.1145/1007352.1007400

[. Jitkrittum, Z. Szabó, K. Chwialkowski, and A. Gretton, Interpretable Distribution Features with Maximum Testing Power, Advances in Neural Information and Processing Systems (NIPS) Nips (2016), pp.1-21

[. Joshi, R. V. Kommaraju, J. M. Phillips, and S. Venkatasubramanian, Comparing distributions and shapes using the kernel distance, Proceedings of the 27th annual ACM symposium on Computational geometry, SoCG '11, pp.47-56
DOI : 10.1145/1998196.1998204

URL : http://www.cs.utah.edu/~jeffp/papers/arXiv1001.0591.pdf

[. Jain, A. Tewari, and I. S. Dhillon, Orthogonal matching pursuit with replacement, Advances in Neural Information Processing Systems (NIPS). 2011, pp.1215-1223

M. Kapralov, C. Cable, P. David, and . Woodruff, How to Fake Multiply by a Gaussian Matrix, Icml (2016), pp.1-37

[. Keriven, A. Bourrier, R. Gribonval, and P. Pérèz, Sketching for large-scale learning of mixture models, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7472867

URL : https://hal.archives-ouvertes.fr/hal-01208027

[. Keriven, N. Tremblay, Y. Traonmilin, and R. Gribonval, Compressive K-means, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI : 10.1109/ICASSP.2017.7953382

URL : https://hal.archives-ouvertes.fr/hal-01386077

[. Keriven, A. Bourrier, R. Gribonval, and P. Pérèz, Sketching for large-scale learning of mixture models, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-50, 2017.
DOI : 10.1109/ICASSP.2016.7472867

URL : https://hal.archives-ouvertes.fr/hal-01208027

[. Keriven, SketchMLbox : a Matlab toolbox for large-scale learning of mixture models, 2016.

R. [. Kullback and . Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol.22, issue.1, pp.79-86, 1951.
DOI : 10.1214/aoms/1177729694

URL : http://doi.org/10.1214/aoms/1177729694

[. Kring, S. T. Rachev, M. Höchstötter, and F. J. Fabozzi, Estimation of alpha-stable sub-Gaussian distributions for asset returns, Contributions to Economics, pp.111-152, 2009.

S. George, G. Kimeldorf, and . Wahba, A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines, Math. Stat, vol.41, pp.495-502, 1970.

J. Knight and J. Yu, Empirical Characteristic Function in Time Series Estimation, pp.691-721, 2002.
DOI : 10.1017/s026646660218306x

URL : http://yoda.eco.auckland.ac.nz/~jyu/ecfarma7.pdf

J. Henry and . Landau, Moments in mathematics, 1987.

G. Loosli, S. Canu, and L. Bottou, Training Invariant Support Vector Machines using Selective Sampling, pp.301-320, 2007.

C. Lcb98-]-yann-lecun, . Cortes, J. Christopher, and . Burges, The MNIST database of handwritten digits, 1998.

L. Lemagoarou, Matrices efficientes pour le traitement du signal et l'apprentissage automatique

C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, Mismatch string kernels for discriminative protein classification, Bioinformatics, vol.20, issue.4, pp.467-476, 2004.
DOI : 10.1093/bioinformatics/btg431

URL : https://academic.oup.com/bioinformatics/article-pdf/20/4/467/476867/btg431.pdf

[. Lemagoarou and R. Gribonval, Flexible Multilayer Sparse Approximations of Matrices and Applications, IEEE Journal of Selected Topics in Signal Processing, vol.10, issue.4, pp.688-700, 2016.
DOI : 10.1109/JSTSP.2016.2543461

L. Charles, R. J. Lawson, and . Hanson, Solving least squares problems, In: SIAM classics in applied mathematics, 1995.

P. Stuart and . Lloyd, Least Squares Quantization in PCM, IEEE Transactions on Information Theory, vol.282, pp.129-137, 1982.

H. Lodhi, C. Saunders, J. Shawe-taylor, N. Cristianini, and C. Watkins, Text Classification using String Kernels, Journal of Machine Learning Research, vol.2, pp.419-444, 2002.

V. Quoc, T. Le, A. J. Sarlós, and . Smola, Fastfood ? Approximating Kernel Expansions in Loglinear Time, International Conference on Machine Learning (ICML), 2013.

M. Lucic, M. Faulkner, A. Krause, and D. Feldman, Training Mixture Models at Scale via Coresets, 2017.

D. W. Marquardt, An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial and Applied Mathematics, vol.11, issue.2, 1963.
DOI : 10.1137/0111030

[. Mailhé and R. Gribonval, LocOMP: algorithme localement orthogonal pour l'approximation parcimonieuse rapide de signaux longs sur des dictionnaires locaux, 2009.

M. Muja, G. David, and . Lowe, Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, International Conference on Computer Vision Theory and Applications, 2009.

[. Maillard and R. Munos, Compressed Least-Squares Regression, Advances in Neural Information and Processing Systems, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00419210

K. Madsen, H. Nielsen, and O. Tingleff, Methods for non-linear least squares problems, Infomatics and Mathematical Modeling, vol.2, 2004.

K. Muandet, F. Fukumizu, B. Dinuzzo, and . Schölkopf, Learning from distributions via support measure machines, Advances in Neural Information Processing Systems (NIPS), 2012.

K. Muandet, . Fukumizu, K. Bharath, A. Sriperumbudur, B. Gretton et al., Kernel Mean Estimation and Stein's Effect, 31st International Conference on Machine Learning, pp.10-18, 2014.

Y. Mukuta, Kernel Approximation via Empirical Orthogonal Decomposition for Unsupervised Feature Learning, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5222-5230
DOI : 10.1109/CVPR.2016.564

]. A. Mul97 and . Muller, Integral Probability Metrics and Their Generating Classes of Functions, In: Advances in Applied Probability, vol.292, pp.429-443, 1997.

[. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, vol.41, issue.12, pp.3397-3415, 1993.
DOI : 10.1109/78.258082

URL : http://home.ustc.edu.cn/~zhanghan/cs/Mallat_Zhang93.pdf

S. Nam, M. E. Davies, M. Elad, and R. Gribonval, The cosparse analysis model and algorithms, Applied and Computational Harmonic Analysis, vol.34, issue.1, pp.30-56, 2013.
DOI : 10.1016/j.acha.2012.03.006

URL : https://hal.archives-ouvertes.fr/inria-00602205

Y. Andrew, M. I. Ng, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems, pp.849-856, 2001.

K. Whitney, D. Newey, and . Mcfadden, Large sample estimation and hypothesis testing, Handbook of Econometrics, vol.4, pp.2111-2245, 1994.

J. P. Nolan, Modeling Financial Data with Stable Distributions, pp.105-129, 2002.
DOI : 10.1016/B978-044450896-6.50005-4

URL : http://academic2.american.edu/%7Ejpnolan/stable/StableFinance23Mar2005.pdf

J. P. Nolan, Multivariate elliptically contoured stable distributions: theory and estimation, Computational Statistics, vol.55, issue.2, pp.2067-2089, 2013.
DOI : 10.1016/B978-0-12-274460-0.50029-1

URL : http://academic2.american.edu/%7Ejpnolan/stable/EllipticalStable.pdf

J. P. Nolan, .. K. Panorska, and J. H. Mcculloch, Estimation of stable spectral measures, Mathematical and Computer Modelling, vol.34, issue.9-11, pp.9-11, 2001.
DOI : 10.1016/S0895-7177(01)00119-4

D. Needell and J. A. Tropp, CoSaMP, Communications of the ACM, vol.53, issue.12, pp.301-321, 2009.
DOI : 10.1145/1859204.1859229

B. Junier, A. Oliva, B. Dubey, J. Poczos, E. P. Schneider et al., Bayesian Nonparametric Kernel-Learning] Vadym Omelchenko Parameter estimation of sub-Gaussian stable distributions, International Conference on Artificial Intelligence and Statistics (AISTATS). 2015, pp.929-949, 2015.

B. Junier, D. J. Oliva, J. Sutherland, and . Schneider, Deep Mean Maps, 2015.

[. Pourkamali-anaraki and S. Becker, Randomized Clustered Nystrom for Large-Scale Kernel Machines, pp.1-31, 2016.

[. Puy, M. E. Davies, and R. Gribonval, Linear embeddings of lowdimensional subsets of a Hilbert space to R m, European Signal Processing Conference (EUSIPCO). 2015, pp.469-473
URL : https://hal.archives-ouvertes.fr/hal-01116153

[. Peleg, R. Gribonval, and M. E. Davies, Compressed Sensing and Best Approximation from Unions of Subspaces: Beyond Dictionaries, 21st European Signal Processing Conference, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00812858

R. [. Pati, P. S. Rezaiifar, and . Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, 1993.
DOI : 10.1109/ACSSC.1993.342465

URL : http://www.isr.umd.edu/~krishna/images/pati_reza_psk.pdf

[. Paige, D. Sejdinovic, and F. Wood, Super-Sampling with a Reservoir, In: Uncertainty in Artificial Intelligence, 2016.

J. Pennington, X. Felix, and . Yu, Spherical Random Features for Polynomial Kernels, 2015.

M. William and . Rand, Objective Criteria for the Evaluation of Clustering Methods, Journal of American Statistical Association, vol.66336, pp.846-850, 1971.

H. Rauhut, On the impossibility of uniform sparse reconstruction using greedy methods, In: Sampling Theory in Signal and Image, pp.1-15, 2008.

H. Reboredo, F. Renna, R. Calderbank, and M. R. Rodrigues, Compressive classification, 2013 IEEE International Symposium on Information Theory, pp.1029-1032
DOI : 10.1109/ISIT.2013.6620311

URL : http://arxiv.org/pdf/1302.4660.pdf

J. Sashank, A. Reddi, B. Ramdas, A. Póczos, L. Singh et al., On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions, AAAI Conference on Artifical Intelligence, 2015.

[. Recht, M. Fazel, and P. A. Parrilo, Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, SIAM Review, vol.52, issue.3, pp.1-33, 2007.
DOI : 10.1137/070697835

URL : http://arxiv.org/pdf/0706.4138

A. Douglas, T. F. Reynolds, R. B. Quatieri, and . Dunn, Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, vol.10, pp.1-3, 2000.

A. Rahimi and B. Recht, Random Features for Large Scale Kernel Machines, Advances in Neural Information Processing Systems (NIPS), 2007.

A. Rahimi and B. Recht, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in Neural Information Processing Systems (NIPS), 2009.

A. Douglas, R. C. Reynolds, and . Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing, vol.31, pp.72-83, 1995.

W. Rudin, Fourier Analysis on Groups, 1962.
DOI : 10.1002/9781118165621

. Walter-rudin, Real and complex analysis. McGraw-Hil, 1987.

W. Rudin, Functional Analysis, 1991.

A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Dremeau et al., Random projections through multiple optical scattering: Approximating Kernels at the speed of light, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6215-6219, 2016.
DOI : 10.1109/ICASSP.2016.7472872

V. Schellekens, Compressive Clustering of High-Dimensional Datasets by 1-bit Sketching

A. Sinha and J. Duchi, Learning Kernels with Random Features, Advances in Neural Information and Processing Systems (NIPS), 2016.
DOI : 10.1111/cgf.12264

D. Salas-gonzalez, E. E. Kuruoglu, and D. P. Ruiz, Finite mixture of ??-stable distributions, Digital Signal Processing, vol.19, issue.2, pp.250-264, 2009.
DOI : 10.1016/j.dsp.2007.11.004

D. Salas-gonzalez, E. E. Kuruoglu, and D. P. Ruiz, Modelling with mixture of symmetric stable distributions using Gibbs sampling, Signal Processing, vol.90, issue.3, pp.774-783, 2010.
DOI : 10.1016/j.sigpro.2009.07.003

]. S. Sho+10, V. Hosseini-shojaei, G. R. Nassiri, A. Mohammadian, and . Mohammadpour, Mixture of skewed alpha-stable distributions, AIP Conference Proceedings 1305, pp.130-137, 2010.

[. Schölkopf, R. Herbrich, and A. J. Smola, A Generalized Representer Theorem, In: COLT, pp.416-426, 2001.
DOI : 10.1007/3-540-44581-1_27

A. J. Smola, A. Gretton, L. Song, and B. Schölkopf, A Hilbert Space Embedding for Distributions, International Conference on Algorithmic Learning Theory, pp.13-31, 2007.
DOI : 10.1007/978-3-540-75488-6_5

URL : http://www.kyb.tuebingen.mpg.de/publications/attachments/ALT-2007-Gretton_%5B0%5D.pdf

[. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, Large Scale Multiple Kernel Learning, Journal of Machine Learning Research, vol.7, pp.1531-1565, 2006.

[. Song, A. J. Zhang, A. Smola, B. Gretton, and . Schölkopf, Tailoring density estimation via reproducing kernel moment matching, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.992-1001, 2008.
DOI : 10.1145/1390156.1390281

URL : http://icml2008.cs.helsinki.fi/papers/377.pdf

L. Song, Learning via Hilbert space embedding of distributions, 2008.

[. Sun, Q. Qu, and J. Wright, When Are Nonconvex Problems Not Scary, pp.1-6, 2015.

. Sri+09, K. Bharath, K. Sriperumbudur, A. Fukumizu, G. R. Gretton et al., Kernel choice and classifiability for RKHS embeddings of probability distributions, Advances in Neural Information and Processing Systems (NIPS), 2009.

. Sri+10, K. Bharath, A. Sriperumbudur, K. Gretton, B. Fukumizu et al., Hilbert space embeddings and metrics on probability measures, The Journal of Machine Learning Research, vol.11, pp.1517-1561, 2010.

[. Sridharan, A Gentle Introduction to Concentration Inequalities, 2002.

. Sri11, K. Bharath, and . Sriperumbudur, Mixture density estimation via hilbert space embedding of measures, IEEE International Symposium on Information Theory, pp.1027-1030, 2011.

[. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, 2001.

J. Dougal, J. Sutherland, and . Schneider, On the Error of Random Fourier Features, 2007.

K. Bharath, Z. Sriperumbudur, and . Szabó, Optimal rates for Random Fourier Features, Advances in Neural Information Processing Systems (NIPS) (2015), pp.1144-1152

[. Spring and A. Shrivastava, Scalable and Sustainable Deep Learning via Randomized Hashing, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD '17, 2016.
DOI : 10.1145/1150402.1150436

URL : http://arxiv.org/pdf/1602.08194

M. [. Sadjadi, L. Slaney, and . Heck, MSR Identity Toolbox v1.0: A MATLAB toolbox for speaker-recognition Research, Speech and Language Processing Technical Committee Newsletter, 2013.

H. Steinhaus, Sur la division des corps materiels en parties, Bull. Acad. Polon. Sci. IV (C1.III) IV, vol.12, pp.801-804, 1956.

J. Dougal, J. B. Sutherland, P. Oliva, J. Barnabas, and . Schneider, Lineartime Learning on Distributions with Approximate Kernel Embeddings, 2015.

[. Tang, S. Alelyani, and H. Liu, Feature Selection for Classification: A Review, Data Classification: Algorithms and Applications, pp.37-64, 2014.

[. Traonmilin and R. Gribonval, Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all, Applied and Computational Harmonic Analysis, 2015.
DOI : 10.1016/j.acha.2016.08.004

URL : https://hal.archives-ouvertes.fr/hal-01207987

[. Thaper, S. Guha, P. Indyk, and N. Koudas, Dynamic multidimensional histograms, Proceedings of the 2002 ACM SIGMOD international conference on Management of data , SIGMOD '02, pp.428-439, 2002.
DOI : 10.1145/564691.564741

URL : http://www.cs.rochester.edu/~taoli/summer2004/streams/dynamic_multidim.ps.gz

R. Tibshirani, Regression shrinkage and selection via the lasso: a retrospective, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.36, issue.1, pp.273-282, 2011.
DOI : 10.1214/009053607000000802

C. Kien and . Tran, Estimating mixtures of normal distributions via empirical characteristic function, Econometric Reviews, vol.172, pp.167-183, 1998.

[. Tremblay, G. Puy, P. Borgnat, R. Gribonval, and P. Vandergheynst, Accelerated spectral clustering using graph filtering of random signals, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4094-4098
DOI : 10.1109/ICASSP.2016.7472447

URL : https://hal.archives-ouvertes.fr/hal-01243682

[. Tremblay, G. Puy, R. Gribonval, and P. Vandergheynst, Compressive Spectral Clustering, Proceedings of The 33rd International Conference on Machine Learning, pp.1-15, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01320214

]. P. Tse01 and . Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, vol.1093, pp.475-494, 2001.

V. Vapnik, The Nature of Statistical Learning Theory, 1995.

[. Veillette, STBL: a MATLAB library for working with alpha stable distributions, 2012.

A. Vedaldi and B. Fulkerson, VLFeat -An open and portable library of computer vision algorithms, 2010.

]. S. Vis+10, N. Vishwanathan, R. Schraudolph, K. M. Kondor, and . Borgwardt, Graph Kernels, Journal of Machine Learning Research, vol.11, pp.1201-1242, 2010.

[. Vempala and G. Wang, A spectral algorithm for learning mixture models, Journal of Computer and System Sciences, vol.68, issue.4, pp.841-860, 2004.
DOI : 10.1016/j.jcss.2003.11.008

URL : https://doi.org/10.1016/j.jcss.2003.11.008

A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.343, pp.480-492, 2012.
DOI : 10.1109/cvpr.2010.5539949

URL : http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf

A. Gordon, W. , and R. P. Adams, Gaussian process kernels for pattern discovery and extrapolation, International Conference on Machine Learning (ICML, pp.1067-1075, 2013.

C. Williams and M. W. Seeger, Using the Nystrom Method to Speed Up Kernel Machines, NIPS Proceedings, pp.682-688, 2001.

[. Xinnan, Y. Ananda, T. Suresh, K. Choromanski, D. Holtmann-rice et al., Orthogonal Random Features, Advances in Neural Information and Processing Systems (NIPS). 2016. arXiv

D. Xu and J. Knight, Continuous Empirical Characteristic Function Estimation of Mixtures of Normal Parameters, Econometric Reviews, vol.58, issue.1, pp.25-50, 2010.
DOI : 10.1081/ETC-120039605

[. Yang, A. J. Smola, L. Song, and A. G. Wilson, A la Carte ? Learning Fast Kernels, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.1098-1106, 2015.

J. Zhao and D. Meng, FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test, Neural Computation, vol.26, issue.3, pp.1345-1372, 2015.
DOI : 10.1214/13-AOS1140