N. S. Altman-;-altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, issue.3, pp.175-185, 1992.

D. R. Amancio, Comparing the topological properties of real and artificially generated scientific manuscripts, Scientometrics, vol.105, issue.3, pp.1763-1779, 2015.

P. Ball and . Barbieri, Markov constraints for generating lyrics with style, Proceedings of the 20th European Conference on Artificial Intelligence, vol.434, pp.115-120, 2005.

G. Beel, J. Beel, and B. Gipp, Academic search engine spam and Google scholar's resilience against it, 2010.

. Beel, Academic search engine optimization (ASEO), Journal of scholarly publishing, vol.41, issue.2, pp.176-190, 2010.

W. ;. Bender, P. E. Bender, and J. K. Wolf, New asymptotic bounds and improvements on the Lempel-Ziv data compression algorithm, IEEE Transactions on Information Theory, vol.37, issue.3, pp.721-729, 1991.

J. Bohannon, Who's afraid of peer review?, Science, vol.342, issue.6154, pp.60-65, 2013.

[. Broder, Syntactic clustering of the web, Comput. Netw. ISDN Syst, vol.29, issue.8-13, pp.1157-1166, 1997.

A. Bulhak, On the simulation of postmodernism and mental debility using recursive transition networks, 1996.

J. A. Byrne and C. Labbé, Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines, Scientometrics, vol.110, issue.3, pp.1471-1493, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01487829

[. Cer, Parsing to Stanford dependencies: Trade-offs between speed and accuracy, IEEE Transactions on Information Theory, vol.2, issue.2, pp.137-167, 1956.

N. Chomsky and . Collingwood, Rtexttools: A supervised learning package for text classification, The R Journal, vol.5, issue.1, pp.6-13, 2002.

C. , Automatic deception detection: Methods for finding fake news, Proceedings of the Association for Information Science and Technology, vol.52, issue.1, pp.1-4, 2015.

. Culotta, A. Sorensen-;-culotta, and J. Sorensen, Dependency tree kernels for relation extraction, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL '04, 2004.

[. Dalkilic, Using compression to identify classes of inauthentic texts, Proc. of the 2006 SIAM Conf. on Data Mining, 2006.

[. Delgado-lópez-cózar, The Google scholar experiment: How to index false papers and manipulate bibliometric indicators, Journal of the Association for Information Science and Technology, vol.65, issue.3, pp.446-454, 2014.

[. Durán, Similarity of sentences through comparison of syntactic trees with pairs of similar words, Electrical Engineering, Computing Science and Automatic Control (CCE), pp.1-6, 2014.

. Fahrenberg, Measuring global similarity between texts, Statistical Language and Speech Processing, pp.220-232, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01087009

[. Feinerer, Text mining infrastructure in R, Journal of Statistical Software, vol.25, issue.5, pp.1-54, 2008.

J. R. Firth, A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis, 1957.

[. Friedman, Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, vol.33, issue.1, pp.1-22, 2010.

[. Friedman, Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, vol.28, pp.337-407, 2000.

[. Ganguly, Word embedding based generalized language model for information retrieval, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.795-798, 2014.

A. Graves-;-graves, J. Hartley, and G. Cabanac, What can new technology tell us about the reviewing process for journal submissions in BJET ?, vol.48, pp.212-220, 2013.

[. Herrera, UNED at PASCAL RTE-2 challenge, 2nd PASCAL Challenges Workshop on Recognising Textual Entailment, pp.38-43, 2006.

[. Huttenlocher, Early vocabulary growth: Relation to language input and gender, Developmental psychology, vol.27, issue.2, p.236, 1991.

J. Kao, More than a million pro-repeal net neutrality comments were likely, 2017.

A. Karpathy, The unreasonable effectiveness of, 2017.

. Klein, D. Manning-;-klein, and C. D. Manning, Fast exact inference with a factored model for natural language parsing, pp.3-10, 2003.

S. Kullback, Information Theory and Statistics, 1959.

L. Kullback, S. Kullback, and R. A. Leibler, On information and sufficiency, Ann. Math. Statist, vol.22, issue.1, pp.79-86, 1951.

. Labbé and C. Labbé, Ike Antkare one of the great stars in the scientific firmament, ISSI Newsletter, vol.6, issue.2, pp.48-52, 2010.

C. Labbé and D. Labbé, A tool for literary studies: Intertextual distance and tree classification, LLC, vol.21, issue.3, pp.311-326, 2006.

C. Labbé and D. Labbé, Detection of hidden intertextuality in the scientic publications, International Conference on Textual Data Statistical Analysis. JADT, 2012.

C. Labbé and D. Labbé, Duplicate and fake publications in the scientific literature: How many SCIgen papers in computer science?, Scientometrics, vol.94, issue.1, pp.379-396, 2013.

C. Labbé, D. Labbé, and . Labbé, Creativity and Universality in Language, chapter Detection of computer generated papers in scientific literature, Proceedings of the 12th International Conference on Textual Data Statistical Analysis, pp.323-336, 2014.

[. Labbé, Detection of Computer-Generated Papers in Scientific Literature, pp.123-141, 2016.

K. Lavoie, A. Lavoie, and M. Krishnamoorthy, Algorithmic detection of computer generated text, 2010.

W. Liaw, A. Liaw, and M. Wiener, Classification and regression by random forest, R News, vol.2, issue.3, pp.18-22, 2002.

. López-cózar, Manipulating Google Scholar citations and Google Scholar metrics: Simple, easy and tempting, 2012.

[. Medelyan, Human-competitive tagging using automatic keyphrase extraction, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol.3, pp.1318-1327, 2009.

T. Mikolov-;-mikolov, Statistical language models based on neural networks, 2012.

[. Mikolov, Efficient estimation of word representations in vector space, 2013.

[. Mikolov, Strategies for training large scale neural network language models, Automatic Speech Recognition and Understanding (ASRU), pp.196-201, 2011.

[. Mikolov, Exploiting similarities among languages for machine translation, 2013.

[. Mikolov, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013.

[. Mikolov, Linguistic regularities in continuous space word representations, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.746-751, 2013.

[. Myers, Dilution assay statistics, Journal of clinical microbiology, vol.32, issue.3, pp.732-739, 1994.

M. Nguyen and C. Labbé, Engineering a tool to detect automatically generated papers, Proceedings of the Third Workshop on Bibliometric-enhanced Information Retrieval co-located with the 38th European Conference on Information Retrieval (ECIR 2016), pp.54-62, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01482265

[. Palangi, Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol.24, issue.4, pp.694-707, 2016.

[. Pedersen, Wordnet::similarity: Measuring the relatedness of concepts, Demonstration Papers at HLT-NAACL 2004, HLTNAACL-Demonstrations '04, pp.38-41, 2004.

[. Pennington, Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.

N. Phillips, N. Phillips, and . Quimbaya, Named entity recognition over electronic health records through a combined dictionary-based approach, Natures 10 ten people who mattered this year-Jennifer Byrne: Error sleuth, vol.100, pp.55-61, 2016.

J. Savoy-;-savoy, Authorship attribution based on specific vocabulary, ACM Trans. Inf. Syst, vol.30, issue.2, p.30, 2012.

R. J. Sethi-;-sethi, Crowdsourcing the verification of fake news and alternative facts, Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT '17, pp.315-316, 2017.

[. Shu, Fake news detection on social media: A data mining perspective, SIGKDD Explor. Newsl, vol.19, issue.1, pp.22-36, 2017.

A. Singhal-;-singhal, Modern Information Retrieval: A Brief Overview, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.24, issue.4, pp.35-42, 2001.

[. Sochenkov, Exactus like: Plagiarism detection in scientific texts, European Conference on Information Retrieval, pp.837-840, 2016.

A. D. Sokal, Transgressing the boundaries: Toward a transformative hermeneutics of quantum gravity, Social Text, vol.46, pp.217-252, 1996.

[. Sutskever, Generating text with recurrent neural networks, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.1017-1024, 2011.

[. Tang, Learning sentiment-specific word embedding for twitter sentiment classification, ACL (1), pp.1555-1565, 2014.

R. Van-noorden-;-van-noorden, Publishers withdraw more than 120 gibberish papers, Nature News, 2014.

[. Volkova, Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on Twitter, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.647-653, 2017.

N. Wang, R. Wang, and G. Neumann, Recognizing textual entailment using sentence similarity based on dependency tree skeletons, Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, RTE '07, pp.36-41, 2007.

G. Williams, K. Williams, and C. L. Giles, On the use of similarity search to detect fake scientific papers, Similarity Search and Applications-8th International Conference, SISAP 2015, pp.332-338, 2015.

J. Xiong and T. Huang, An effective method to identify machine automatically generated paper, In Knowledge Engineering and Software Engineering, pp.101-102, 2009.

J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theor, vol.23, issue.3, pp.337-343, 2006.

[. Zou, Bilingual word embeddings for phrase-based machine translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.1393-1398, 2013.

D. Zubarev and I. Sochenkov, Using sentence similarity measure for plagiarism source retrieval, CLEF (Working Notes), pp.1027-1034, 2014.

G. Zweig and C. J. Burges, The Microsoft Research sentence completion challenge, Appendices, vol.85, p.87, 2011.

C. Labbé and D. Labbé, Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science, Scientometrics, vol.94, issue.1, pp.379-396, 2013.