A. Axiomatic-approach-to and P. , 123 6.5.1 Validation of the DF Constraint, p.130

P. Review-of and .. Models, 133 6.6.1 PRF for Language Models, and Information Models . . . . . . . . . . . . . . 137

I. Papers, @. S. Clinchant, and E. Gaussier, Retrieval constraints and word frequency distributions: a log-logistic model for IR, Information Retrieval, vol.14, issue.1, 2010.

J. Ah-pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot et al., Crossing textual and visual content in different application scenarios, Multimedia Tools and Applications, vol.3, issue.1, pp.31-56, 2009.
DOI : 10.1007/s11042-008-0246-8

URL : https://hal.archives-ouvertes.fr/hal-01504484

@. J. Book-chapter, S. Ah-pine, G. Clinchant, F. Csurka, and . Perronnin, Leveraging Image, Text and Cross?media Similarities for Diversity?focused Multimedia Retrieval in ImageCLEF , Experimental Evaluation in Visual Information Retrieval Springer Series: The Information Retrieval, 2010.

N. Papers, @. S. Clinchant, E. Gaussier, @. S. Clinchant, E. Gaussier et al., Modèle de RI fondés sur l'information. Document Numérique Is document frequency important for PRF? Semantic combination of textual and visual information in multimedia retrieval, Proceeding of International Conference on the Theory of Information Retrieval, ICTIR 2011 International Conference on Multimedia Retrieval, 2011.

@. S. Clinchant and E. Gaussier, Information-based models for ad hoc IR, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pp.234-241, 2010.
DOI : 10.1145/1835449.1835490

URL : https://hal.archives-ouvertes.fr/hal-00953830

@. S. Clinchant and E. Gaussier, Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR, Proceeding of International Conference on the Theory of Information Retrieval, pp.54-65, 2009.
DOI : 10.1145/984321.984322

@. S. Clinchant and E. Gaussier, The BNB Distribution for Text Modeling, European Conference in Information Retrieval ECIR, pp.150-161, 2008.
DOI : 10.1007/978-3-540-78646-7_16

S. Clinchant, C. Goutte, and E. Gaussier, Lexical entailment for information re-trieval, ECIR, pp.217-228, 2006.
DOI : 10.1007/11735106_20

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. Conferences, @. S. Clinchant, and E. Gaussier, A document frequency constraint for pseudorelevance feedback models, CORIA, pp.73-88, 2011.

@. S. Clinchant, E. Gaussier, @. G. Csurka, S. Clinchant, and G. Jacquet, Modèles de RI fondé sur l'information Prix du meilleur article Posters ? S. Clinchant and E. Gaussier. Do IR models satisfy the TDC constraint ? Medical Image Modality Classification and Retrieval, CORIA Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval , SIGIR'11 to appear 9th Internatioanl Workshop on Content-Based Multimedia Indexing, pp.99-114, 2010.

@. S. Clinchant and E. Gaussier, Retrieval constraints and word frequency distributions, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.1975-1978, 2009.
DOI : 10.1145/1645953.1646280

URL : https://hal.archives-ouvertes.fr/hal-00742020

G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni, Fondazione Ugo Bordoni at TREC 2003: robust and web track, 2003.

G. Amati and C. Van-rijsbergen, Probabilistic models of information retrieval based on measuring the divergence from randomness, ACM Transactions on Information Systems, vol.20, issue.4, pp.357-389, 2002.
DOI : 10.1145/582415.582416

A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, On smoothing and inference for topic models, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.

R. H. Baayen, Word Frequency Distributions, Kluwer Academic, vol.18, 2001.
DOI : 10.1007/978-94-010-0844-0

R. Baeza-yates and B. Ribeiro-neto, Modern Information Retrieval, 2008.

A. L. Barabasi and R. Albert, Emergence of scaling in random networks, Science, vol.286, issue.5439, pp.509-512, 1999.

M. David, A. Y. Blei, M. I. Ng, and . Jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003.

W. Buntine and A. Jakulin, Applying discrete PCA in data analysis, AUAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp.59-66, 2004.

C. Burges, T. Shaked, E. Renshaw, M. Deeds, N. Hamilton et al., Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.89-96, 2005.
DOI : 10.1145/1102351.1102363

D. Chakrabarti and C. Faloutsos, Graph mining, ACM Computing Surveys, vol.38, issue.1, 2006.
DOI : 10.1145/1132952.1132954

K. Church and W. A. Gale, Inverse Document Frequency (IDF): A Measure of Deviations from Poisson, Proceedings of the Third Workshop on Very Large Corpora, pp.121-130, 1995.
DOI : 10.1007/978-94-017-2390-9_18

K. W. Church, Empirical estimates of adaptation, Proceedings of the 18th conference on Computational linguistics -, pp.180-186, 2000.
DOI : 10.3115/990820.990847

W. Kenneth, W. A. Church, and . Gale, Poisson mixtures, Natural Language Engineering, vol.1, pp.163-190, 1995.

S. Clinchant and . Gaussier, The BNB Distribution for Text Modeling
DOI : 10.1007/978-3-540-78646-7_16

S. Clinchant and . Gaussier, Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR, ICTIR, pp.54-65, 2009.
DOI : 10.1145/984321.984322

S. Clinchant and . Gaussier, Retrieval constraints and word frequency distributions, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.1975-1978, 2009.
DOI : 10.1145/1645953.1646280

URL : https://hal.archives-ouvertes.fr/hal-00742020

S. Clinchant and E. Gaussier, Information-based models for ad hoc IR, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pp.234-241, 2010.
DOI : 10.1145/1835449.1835490

URL : https://hal.archives-ouvertes.fr/hal-00953830

S. Clinchant and . Gaussier, Mod??les de RI fond??s sur l???information, CORIA, pp.99-114, 2010.
DOI : 10.3166/dn.14.2.103-123

S. Clinchant and . Gaussier, Retrieval constraints and word frequency distributions, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, 2010.
DOI : 10.1145/1645953.1646280

URL : https://hal.archives-ouvertes.fr/hal-00742020

S. Clinchant and . Gaussier, A document frequency constraint for pseudo-relevance feedback models, CORIA, pp.73-88, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00744097

S. Clinchant and . Gaussier, Is document frequency important for prf? In ICTIR, to appear, 2011.

K. Collins-thompson, Estimating robust query models with convex optimization, NIPS, pp.329-336, 2008.

K. Collins-thompson, Reducing the risk of query expansion via robust constrained optimization, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.837-846, 2009.
DOI : 10.1145/1645953.1646059

K. Collins-thompson and J. Callan, Estimation and use of uncertainty in pseudo-relevance feedback, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pp.303-310, 2007.
DOI : 10.1145/1277741.1277795

D. W. Crabtree, P. Andreae, and X. Gao, Exploiting underrepresented query aspects for automatic query expansion, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, pp.191-200, 2007.
DOI : 10.1145/1281192.1281216

R. Cummins and C. Riordan, An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions, Artificial Intelligence Review, vol.8, issue.5, pp.51-68, 2007.
DOI : 10.1007/s10462-008-9074-5

S. Deerwester, Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 51st ASIS Annual Meeting (ASIS '88), 1988.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, vol.39, issue.1, pp.1-38, 1977.

J. V. Dillon and K. Collins-thompson, A unified optimization framework for robust pseudo-relevance feedback algorithms, Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pp.1069-1078, 2010.
DOI : 10.1145/1871437.1871573

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

C. Durot, Testing Convexity or Concavity of a Cumulated Hazard Rate, IEEE Transactions on Reliability, vol.57, issue.3, pp.465-473, 2008.
DOI : 10.1109/TR.2008.928181

C. Elkan, Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.289-296, 2006.
DOI : 10.1145/1143844.1143881

S. E. Fienberg, E. M. Airoldi, and W. W. Cohen, Statistical models for frequent terms in text, CMU-CLAD, 2004.

H. Fang, T. Tao, and C. Zhai, A formal study of information retrieval heuristics, Proceedings of the 27th annual international conference on Research and development in information retrieval , SIGIR '04, 2004.
DOI : 10.1145/1008992.1009004

H. Fang and C. Zhai, Semantic term matching in axiomatic approaches to information retrieval, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '06, pp.115-122, 2006.
DOI : 10.1145/1148170.1148193

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, An efficient boosting algorithm for combining preferences, J. Mach. Learn. Res, vol.4, pp.933-969, 2003.

D. Michael, P. Gordon, and . Lenk, When is the probability ranking principle suboptimal?, JASIS, vol.43, issue.1, pp.1-14, 1992.

S. P. Harter, A probabilistic approach to automatic keyword indexing, Journal of the American Society for Information Science, vol.26, 1975.

D. Hiemstra, S. Robertson, and H. Zaragoza, Parsimonious language models for information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval , SIGIR '04, pp.178-185, 2004.
DOI : 10.1145/1008992.1009025

K. Hoashi, K. Matsumoto, N. Inoue, and K. Hashimoto, Query expansion based on predictive algorithms for collaborative filtering, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, pp.414-415, 2001.
DOI : 10.1145/383952.384063

T. Hofmann, Probabilistic latent semantic indexing, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '99, pp.50-57, 1999.
DOI : 10.1145/312624.312649

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Jansche, Parametric models of linguistic count data, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, pp.288-295, 2003.
DOI : 10.3115/1075096.1075133

H. Jégou, M. Douze, and C. Schmid, On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206609

N. L. Johnson and S. Kotz, Distributions in statistics: continuous multivariate distributions, 1972.

M. Slava and . Katz, Distribution of content words and phrases in text and language modelling, Nat. Lang. Eng, vol.2, issue.1, pp.15-59, 1996.

J. Lafferty and C. Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, pp.111-119, 2001.
DOI : 10.1145/383952.383970

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

V. Lavrenko, M. Choquette, and W. B. Croft, Cross-lingual relevance models, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '02, pp.175-182, 2002.
DOI : 10.1145/564376.564408

V. Lavrenko and W. B. Croft, Relevance based language models, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, pp.120-127, 2001.
DOI : 10.1145/383952.383972

K. Soon-lee, W. B. Croft, and J. Allan, A cluster-based resampling method for pseudo-relevance feedback, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pp.235-242, 2008.

Y. Lv and C. Zhai, Adaptive relevance feedback in information retrieval, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.255-264, 2009.
DOI : 10.1145/1645953.1645988

Y. Lv and C. Zhai, A comparative study of methods for estimating query language models with pseudo feedback, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.1895-1898, 2009.
DOI : 10.1145/1645953.1646259

Y. Lv and C. Zhai, Positional relevance model for pseudo-relevance feedback, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pp.579-586, 2010.
DOI : 10.1145/1835449.1835546

D. Rasmus-elsborg-madsen, C. Kauchak, and . Elkan, Modeling word burstiness using the dirichlet distribution, of ACM International Conference Proceeding Series, pp.545-552, 2005.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, 2008.
DOI : 10.1017/CBO9780511809071

D. Christopher, H. Manning, and . Schütze, Foundations of statistical natural language processing, 1999.

E. L. Margulis, N-Poisson document modelling, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '92, pp.177-189, 1992.
DOI : 10.1145/133160.133195

A. Mccallum and K. Nigam, A comparison of event models for naive bayes text classification, The Fifteenth National Conference on Artificial Intelligence (AAAI, 1998.

Q. Mei, H. Fang, and C. Zhai, A study of Poisson query generation model for information retrieval, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pp.319-326, 2007.
DOI : 10.1145/1277741.1277797

T. Minka, Estimating a Dirichlet Distribution, 2003.

R. Nallapati, T. Minka, and S. Robertson, The smoothed-dirichlet distribution: a new building block for generative models, 2006.

R. Nallapati, Discriminative models for information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval , SIGIR '04, pp.64-71, 2004.
DOI : 10.1145/1008992.1009006

J. Naudts, Abstract, Open Physics, vol.7, issue.3, p.12003, 2010.
DOI : 10.2478/s11534-008-0150-x

J. Nie, Cross-Language Information Retrieval. Synthesis Lectures on Human Language Technologies, 2010.

K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using em, Machine Learning, pp.103-134, 1999.

P. Ogilvie and J. P. Callan, Experiments Using the Lemur Toolkit, Text REtrieval Conference, 2001.

M. Jay, W. B. Ponte, and . Croft, A language modeling approach to information retrieval, SIGIR, pp.275-281, 1998.

J. D. Rennie, The log-log term frequency distribution, 2005.

L. Rigouste, Modéthodes probabilistes pour l'analyse exploratoire de données textuelles, Thèse de l'ENST, 2006.

C. S. Robertson, H. Zaragoza, S. Robertson, and H. Zaragoza, The probabilistic relevance framework, p.25

S. E. Robertson, THE PROBABILITY RANKING PRINCIPLE IN IR, Journal of Documentation, vol.33, issue.4, pp.294-304, 1977.
DOI : 10.1108/eb026647

S. E. Robertson and S. Walker, Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval, SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp.232-241, 1994.
DOI : 10.1007/978-1-4471-2099-5_24

G. Salton and M. J. Mcgill, Introduction to Modern Information Retrieval, 1983.

A. Sarkar, P. H. Garthwaite, and A. D. Roeck, A Bayesian mixture model for term re-occurrence and burstiness, Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL '05, pp.48-55, 2005.
DOI : 10.3115/1706543.1706552

J. Seo and W. B. Croft, Geometric representations for multiple documents, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pp.251-258, 2010.
DOI : 10.1145/1835449.1835493

C. E. Shannon, A mathematical theory of communication. The Bell system technical journal, pp.379-423, 1948.

A. Singhal, C. Buckley, M. Mitra, and A. Mitra, Pivoted document length normalization, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '96, pp.21-29, 1996.
DOI : 10.1145/243199.243206

T. Tao and C. Zhai, Regularized estimation of mixture models for robust pseudo-relevance feedback, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '06, pp.162-169, 2006.
DOI : 10.1145/1148170.1148201

Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, Hierarchical Dirichlet Processes, Journal of the American Statistical Association, vol.101, issue.476, 2004.
DOI : 10.1198/016214506000000302

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

I. Ounis, V. Plachouras, and B. He, University of Glasgow at TREC 2004: Experiments in web, robust and terabyte tracks with terrier, 2004.

J. Xu, A boosting algorithm for information retrieval, Proceedings of SI- GIR'07, 2007.

Z. Xu and R. Akella, A new probabilistic retrieval model based on the dirichlet compound multinomial distribution, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pp.427-434, 2008.
DOI : 10.1145/1390334.1390408

Y. Yue and T. Finley, A support vector method for optimizing average precision, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pp.271-278, 2007.
DOI : 10.1145/1277741.1277790

C. Zhai and J. Lafferty, Model-based feedback in the language modeling approach to information retrieval, Proceedings of the tenth international conference on Information and knowledge management , CIKM'01, pp.403-410, 2001.
DOI : 10.1145/502585.502654

C. Zhai and J. Lafferty, A study of smoothing methods for language models applied to Ad Hoc information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, pp.334-342, 2001.
DOI : 10.1145/383952.384019