R. Affandi, E. Fox, and B. Taskar, Approximate inference in continuous determinantal processes, Adv. NIPS, 2013.

R. Affandi, E. Fox, R. Adams, and B. Taskar, Learning the parameters of determinantal point process kernels, Proc. ICML, 2014.

C. Aggarwal and C. Zhai, Mining text data, 2012.

L. Alsumait, D. Barbará, J. Gentle, and C. Domeniconi, Topic Significance Ranking of LDA Generative Models, Proc. ECML, 2009.
DOI : 10.1145/1150402.1150450

K. Atkinson and W. Han, Spherical Harmonics and Approximations on the Unit Sphere: an Introduction, 2012.
DOI : 10.1007/978-3-642-25983-8

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Adv. NIPS, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

R. Bardenet and M. Titsias, Inference for determinantal point processes without spectral knowledge, Adv. NIPS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01245315

D. Batra, P. Yadollahpour, A. Guzman-rivera, and G. Shakhnarovich, Diverse M-Best Solutions in Markov Random Fields, Proc. ECCV, 2012.
DOI : 10.1007/978-3-642-33715-4_1

J. Becker and D. Kuropka, Topic-based vector space model, Proc. ICBIS, 2003.

S. Bird, E. Klein, and E. Loper, Natural language processing with Python, 2009.

C. Bishop, Pattern Recognition and Machine Learning, 2006.

D. Blei and J. Lafferty, Dynamic topic models, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143859

D. Blei and J. Lafferty, A correlated topic model of Science. The Annals of Applied Statistics, pp.17-35, 2007.

D. Blei, A. Ng, and M. Jordan, Latent Dirichlet allocation, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003.

D. Blei, T. Griffiths, and M. Jordan, The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies, Journal of the ACM, vol.57, issue.2, p.7, 2010.
DOI : 10.1145/1667053.1667056

A. Bordes, S. Chopra, and J. Weston, Question answering with subgraph embeddings . arXiv preprint, 2014.
DOI : 10.3115/v1/d14-1067
URL : http://arxiv.org/pdf/1406.3676

A. Borodin and E. Rains, Eynard???Mehta Theorem, Schur Process, and their Pfaffian Analogs, Journal of Statistical Physics, vol.46, issue.3, pp.291-317, 2005.
DOI : 10.5802/aif.1526
URL : http://arxiv.org/pdf/math-ph/0409059v2.pdf

L. Bottou, Online learning and stochastic approximations. On-line learning in neural networks, 1998.

P. Brown, P. Desouza, R. Mercer, V. Pietra, and J. Lai, Class-based n-gram models of natural language, Computational linguistics, vol.18, issue.4, pp.467-479, 1992.

V. Brunel, A. Moitra, P. Rigollet, and J. Urschel, Maximum likelihood estimation of determinantal point processes, 2017.

W. Buntine and A. Jakulin, Discrete principal component analysis, Proc. of the Subspace, Latent Structure and Feature Selection Techniques: Statistical and Optimisation perspectives Workshop, 2005.

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, pp.593-613, 2009.
DOI : 10.1007/978-1-4684-0192-9

G. Casella and R. Berger, Statistical Inference., Biometrics, vol.49, issue.1, 2002.
DOI : 10.2307/2532634

G. Casella and E. George, Explaining the Gibbs sampler. The American Statistician, pp.167-174, 1992.

J. Chang, S. Gerrish, C. Wang, J. Boyd-graber, and D. Blei, Reading tea leaves: How humans interpret topic models, 2009.

I. Colin and C. Dupuy, Decentralized topic modelling with latent Dirichlet allocation . arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01383111

S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol.41, issue.6, 1990.
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
URL : http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf

B. Delyon, M. Lavielle, and E. Moulines, Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, pp.94-128, 1999.

A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological ), vol.39, issue.1, pp.1-38, 1977.

D. Dey, V. Ramakrishna, M. Hebert, and J. Bagnell, Predicting Multiple Structured Visual Interpretations, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2947-2955, 2015.
DOI : 10.1109/ICCV.2015.337

I. Dhillon and S. Sra, Generalized nonnegative matrix approximations with Bregman divergences, Adv. NIPS, 2005.

Q. Diao, M. Qiu, C. Wu, A. J. Smola, J. Jiang et al., Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS), Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '14, 2014.
DOI : 10.1145/2623330.2623758

J. Djolonga and A. Krause, From MAP to marginals: Variational inference in Bayesian submodular models, Adv. NIPS, 2014.

J. Djolonga, S. Tschiatschek, and A. Krause, Variational inference in mixed probabilistic submodular models, Adv. NIPS, 2016.

S. Dumais, Latent semantic indexing (LSI), The Second Text REtrieval Conference, 1994.

C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, vol.1, issue.3, pp.211-218, 1936.
DOI : 10.1007/BF02288367

P. Felzenszwalb and D. Mcallester, The generalized A* architecture, Journal of Artificial Intelligence Research, vol.29, pp.153-190, 2007.

J. Foulds, S. Kumar, and L. Getoor, Latent topic networks: A versatile probabilistic programming framework for topic models, Proc. ICML, 2015.

Y. Gao, J. Chen, and J. Zhu, Streaming Gibbs sampling for LDA model, 2016.

M. Gartrell, U. Paquet, and N. Koenigstein, Low-rank factorization of determinantal point processes for recommendation, 2016.

J. Gillenwater, A. Kulesza, and B. Taskar, Discovering diverse and salient threads in document collections, Proc. EMNLP, 2012.

J. Gillenwater, A. Kulesza, and B. Taskar, Near-optimal MAP inference for determinantal point processes, Adv. NIPS, 2012.

J. Gillenwater, A. Kulesza, E. Fox, and B. Taskar, Expectation-maximization for learning determinantal point processes, Adv. NIPS, 2014.

T. Griffiths and M. Steyvers, A probabilistic approach to semantic representation, Proc. CogSci, 2002.

T. Griffiths and M. Steyvers, Finding scientific topics, Proceedings of the National Academy of Sciences, vol.88, issue.11, pp.5228-5235, 2004.
DOI : 10.1073/pnas.88.11.4874
URL : http://www.pnas.org/content/101/suppl_1/5228.full.pdf

B. Haeffele, E. Young, and R. Vidal, Structured low-rank matrix factorization: Optimality, algorithm, and applications to image processing, Proc. ICML, 2014.

K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, vol.57, issue.1, pp.97-109, 1970.
DOI : 10.1093/biomet/57.1.97

D. Hiemstra, A probabilistic justification for using tf???idf term weighting in information retrieval, International Journal on Digital Libraries, vol.3, issue.2, pp.131-139, 2000.
DOI : 10.1007/s007999900025

M. Hoffman and D. Blei, Structured stochastic variational inference, Proc. AISTATS, 2015.

M. Hoffman, D. Blei, and F. Bach, Online learning for latent Dirichlet allocation, 2010.

M. Hoffman, D. Blei, C. Wang, and J. Paisley, Stochastic variational inference, Journal of Machine Learning Research, vol.14, issue.1, pp.1303-1347, 2013.

T. Hofmann, Probabilistic latent semantic analysis, Proc. UAI, 1999.

T. Hofmann, Probabilistic latent semantic indexing, Proc. ACM SIGIR, 1999.
DOI : 10.1145/3130348.3130370
URL : http://www-connex.lip6.fr/~amini/././RelatedWorks/Hof99.pdf

G. Huang, C. Guo, M. Kusner, Y. Sun, F. Sha et al., Supervised word mover's distance, Adv. NIPS, 2016.

A. Hyvärinen, J. Karhunen, and E. Oja, Independent component analysis, 2004.

F. Jelinek and R. Mercer, Interpolated estimation of Markov source parameters from sparse data, Proc. Workshop on Pattern Recognition in Practice, 1980.

Y. Jo and A. Oh, Aspect and sentiment unification model for online review analysis, Proceedings of the fourth ACM international conference on Web search and data mining, WSDM '11, 2011.
DOI : 10.1145/1935826.1935932
URL : http://uilab.kaist.ac.kr/research/WSDM11/wsdm400-jo.pdf

B. Kang, Fast determinantal point process sampling with application to clustering, Adv. NIPS, 2013.

N. Kantas, A. Doucet, S. Singh, J. Maciejowski, and N. Chopin, On Particle Methods for Parameter Estimation in State-Space Models, Statistical Science, vol.30, issue.3, pp.328-351, 2015.
DOI : 10.1214/14-STS511

S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.3, pp.400-401, 1987.
DOI : 10.1109/TASSP.1987.1165125

A. Kirillov, B. Savchynskyy, D. Schlesinger, D. Vetrov, and C. Rother, Inferring M-Best Diverse Labelings in a Single One, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.211

R. Kneser and H. Ney, Improved backing-off for M-gram language modeling, 1995 International Conference on Acoustics, Speech, and Signal Processing, 1995.
DOI : 10.1109/ICASSP.1995.479394

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques, 2009.

Y. Koren, R. Bell, and C. Volinsky, Matrix Factorization Techniques for Recommender Systems, Computer, vol.42, issue.8, pp.30-37, 2009.
DOI : 10.1109/MC.2009.263
URL : http://research.yahoo.com/files/ieeecomputer.pdf

A. Krause and D. Golovin, Submodular function maximization. Tractability: Practical Approaches to Hard Problems, p.8, 2012.
DOI : 10.1017/cbo9781139177801.004
URL : http://www.cs.cmu.edu/%7Edgolovin/papers/submodular_survey12.pdf

A. Kulesza and B. Taskar, k-DPPs: Fixed-size determinantal point processes, Proc. ICML, 2011.

A. Kulesza and B. Taskar, Determinantal Point Processes for Machine Learning, Machine Learning, pp.123-286, 2012.
DOI : 10.1561/2200000044

H. Kushner and G. Yin, Stochastic approximation and recursive algorithms and applications, 2003.

M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, From word embeddings to document distances, Proc. ICML, 2015.

J. , H. Lau, D. Newman, and T. Baldwin, Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality, EACL, 2014.
DOI : 10.3115/v1/e14-1056

F. Lavancier, J. Møller, and E. Rubak, Determinantal point process models and statistical inference, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.4, issue.4, pp.853-877, 2015.
DOI : 10.1007/978-1-4612-4628-2
URL : https://hal.archives-ouvertes.fr/hal-01241077

D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

E. Lehmann and G. Casella, Theory of point estimation, 1998.

J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014.

A. Lewis and M. Overton, Nonsmooth optimization via quasi-Newton methods, Mathematical Programming, pp.135-163, 2013.
DOI : 10.1175/1520-0493(2000)129<4031:UODANO>2.0.CO;2
URL : http://www.cs.nyu.edu/faculty/overton/papers/pdffiles/nsoquasi.pdf

C. Li, S. Jegelka, and S. Sra, Efficient sampling for k-determinantal point processes, Proc. AISTATS, 2016.

C. Li, S. Jegelka, and S. Sra, Fast DPP sampling for Nystrom with application to kernel methods, Proc. ICML, 2016.

P. Liang and D. Klein, Online EM for unsupervised models, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on, NAACL '09, 2009.
DOI : 10.3115/1620754.1620843
URL : http://www.cs.berkeley.edu/~pliang/papers/online-naacl2009.pdf

M. Lichman, UCI machine learning repository, 2013. URL http

C. Lin and Y. He, Joint sentiment/topic model for sentiment analysis, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, 2009.
DOI : 10.1145/1645953.1646003

G. Ling, M. R. Lyu, and I. King, Ratings meet reviews, a combined approach to recommend, Proceedings of the 8th ACM Conference on Recommender systems, RecSys '14, 2014.
DOI : 10.1145/2645710.2645728

Z. Mariet and S. Sra, Fixed-point algorithms for learning determinantal point processes, Proc. ICML, 2015.

Z. Mariet and S. Sra, Kronecker determinantal point processes. arXiv preprint, 2016.

J. Mcauley and J. Leskovec, Hidden factors and hidden topics, Proceedings of the 7th ACM conference on Recommender systems, RecSys '13, 2013.
DOI : 10.1145/2507157.2507163

J. Mcauliffe and D. Blei, Supervised topic models, Adv. NIPS, 2008.

Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai, Topic sentiment mixture, Proceedings of the 16th international conference on World Wide Web , WWW '07, 2007.
DOI : 10.1145/1242572.1242596

N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, Equation of State Calculations by Fast Computing Machines, The Journal of Chemical Physics, vol.21, issue.6, pp.1087-1092, 1953.
DOI : 10.1063/1.1700747

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, 2013.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. NIPS, 2013.

D. Mimno, H. Wallach, E. Talley, M. Leenders, and A. Mccallum, Optimizing semantic coherence in topic models, Proc. EMNLP, 2011.

D. Mimno, M. Hoffman, and D. Blei, Sparse stochastic inference for latent Dirichlet allocation, Proc. ICML, 2012.

T. Minka, Estimating a Dirichlet distribution, 2000.

C. Mooers, The theory of digital handling of non-numerical information and its implications to machine economics, Proc. ACM at Rutgers University, 1950.

K. Murphy, Machine learning: a probabilistic perspective, 2012.

R. Neal and G. Hinton, A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, Learning in graphical models, pp.355-368, 1998.
DOI : 10.1007/978-94-011-5014-9_12

D. Newman, J. Lau, K. Grieser, and T. Baldwin, Automatic evaluation of topic coherence, NAACL HLT, 2010.

D. Newman, E. Bonilla, and W. Buntine, Improving topic coherence with regularized topic models, Adv. NIPS, 2011.

K. Nigam, J. Lafferty, and A. Mccallum, Using maximum entropy for text classification, IJCAI-99 workshop on machine learning for information filtering, 1999.

K. Nigam, A. Mccallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning, vol.39, issue.2/3, pp.103-134, 2000.
DOI : 10.1023/A:1007692713085

P. Paatero and U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, vol.18, issue.2, pp.111-126, 1994.
DOI : 10.1007/978-3-642-93295-3_112

J. Paisley, C. Wang, D. Blei, and M. Jordan, Nested Hierarchical Dirichlet Processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.2, 2014.
DOI : 10.1109/TPAMI.2014.2318728
URL : http://arxiv.org/pdf/1210.6738

K. Palla, F. Caron, and Y. Teh, Bayesian nonparametrics for sparse dynamic networks, 2016.

S. Patterson and Y. Teh, Stochastic gradient Riemannian Langevin dynamics on the probability simplex, Adv. NIPS, 2013.

M. Paul and M. Dredze, Factorial LDA: Sparse multi-dimensional text models, Adv. NIPS, 2012.

F. Pereira, N. Tishby, and L. Lee, Distributional clustering of English words, Proceedings of the 31st annual meeting on Association for Computational Linguistics -, 1993.
DOI : 10.3115/981574.981598

A. Podosinnikova, F. Bach, and S. Lacoste-julien, Rethinking LDA: moment matching for discrete ICA, Adv. NIPS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01225271

B. Polyak and A. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

J. Rennie, Improving multi-class text classification with naive Bayes, 2001.

F. Ricci, L. Rokach, and B. Shapira, Introduction to Recommender Systems Handbook, 2011.
DOI : 10.1007/978-0-387-85820-3_1

M. Röder, A. Both, and A. Hinneburg, Exploring the Space of Topic Coherence Measures, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, 2015.
DOI : 10.1093/analys/59.4.338

D. Rohde and O. Cappé, Online maximum-likelihood estimation for latent factor models, 2011 IEEE Statistical Signal Processing Workshop (SSP), 2011.
DOI : 10.1109/SSP.2011.5967760

G. Salton, A. Wong, and C. Yang, A vector space model for automatic indexing, Communications of the ACM, vol.18, issue.11, pp.613-620, 1975.
DOI : 10.1145/361219.361220
URL : http://ecommons.cornell.edu/bitstream/1813/6057/1/74-218.pdf

I. Sato, K. Kurihara, and H. Nakagawa, Deterministic single-pass algorithm for LDA, Adv. NIPS, 2010.

B. Scholkopf and A. Smola, Learning with kernels: support vector machines, regularization , optimization, and beyond, 2001.

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, 2004.
DOI : 10.1017/CBO9780511809682

Y. Teh, M. Jordan, M. Beal, and D. Blei, Hierarchical Dirichlet Processes, Journal of the American Statistical Association, vol.101, issue.476, pp.1566-1581, 2006.
DOI : 10.1198/016214506000000302
URL : http://www.cs.princeton.edu/~blei/papers/TehJordanBealBlei2006.pdf

I. Titov and R. Mcdonald, Modeling online reviews with multi-grain topic models, Proceeding of the 17th international conference on World Wide Web , WWW '08, 2008.
DOI : 10.1145/1367497.1367513
URL : http://cui.unige.ch/~titov/papers/www08.pdf

M. Titterington, Recursive parameter estimation using incomplete data, Journal of the Royal Statistical Society. Series B (Methodological), vol.46, issue.2, pp.257-267, 1984.
DOI : 10.21236/ADA116190

J. Urschel, V. Brunel, A. Moitra, and P. Rigollet, Learning determinantal point processes with moments and cycles. arXiv preprint, 2017.

A. Van and . Vaart, Asymptotic Statistics, 2000.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990517

H. Wallach, Topic modeling, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143967

H. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno, Evaluation methods for topic models, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553515
URL : http://www.cs.umass.edu/~wallach/publications/wallach09evaluation.pdf

C. Wang and D. Blei, Truncation-free online variational inference for Bayesian nonparametric models, Adv. NIPS, 2012.

C. Wang, D. Blei, and D. Heckerman, Continuous time dynamic topic models. arXiv preprint, 2012.

G. Wei and M. Tanner, A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms, Journal of the American Statistical Association, vol.51, issue.411, pp.699-704, 1990.
DOI : 10.1214/aos/1176346060

F. Yan, N. Xu, and Y. Qi, Parallel inference for latent Dirichlet allocation on graphics processing units, Adv. NIPS, 2009.

H. Zhao, B. Jiang, and J. Canny, SAME but Different, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, 2014.
DOI : 10.1145/2623330.2623756

W. Zou, R. Socher, D. Cer, and C. Manning, Bilingual word embeddings for phrase-based machine translation, Proc. EMNLP, 2013.