. .. , 77 5-5 Transformer block as successive parsing step and composition, p.80

). .. , 89 6-9 TSNE visualization of the softmax weights from our DiscoveryBig model for each discourse marker, quote from Pragmatic Markers, 1996.

. .. , Frequency distribution of markers of discourse markers in, Systèmes de traitement des langues pour plusieurs tâches, vol.134, p.138, 2019.

, Discourse markers or classes used by previous work on unsupervised representation learning

, Accuracy when predicting candidate discourse markers using shallow lexical features

, Candidate discourse markers that are the most difficult to predict from shallow features

, Candidate discourse markers that are the easiest to predict from shallow features

, Discovery linguistic probing evaluation

, Transfer evaluation tasks

, Test results on implicit discursive relation prediction task, p.96

, Transfer test accuracies across DiscEval tasks

. .. Disceval, Aggregated transfer test accuracies across, p.111

. .. , Transfer test accuracies across DiscEval subtasks, p.112

, Transfer F1 scores across the categories of DiscEval tasks, p.113

, Discourse marker prediction accuracy

, Categories and most associated markers

, Examples of the Discovery datasets illustrating various relation senses, p.122

, Evaluations des modèles de l'état de l'art sur les tâches de DiscEval, p.140

. .. Marqueurs-de-discours, , p.142

E. Agirre, M. Diab, D. Cer, and A. Gonzalez-agirre, Semeval-2012 task 6: A pilot on semantic textual similarity, Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol.1, pp.385-393, 2012.

J. Allen and M. Core, Draft of DAMSL: Dialog act markup in several layers, 1997.

I. K. Ampomah, S. Park, and S. Lee, A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment, World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering, vol.10, issue.12, pp.1955-1958, 2016.

D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio et al., , 2017.

N. Asher and A. Lascarides, Logics of conversation, 2003.

F. T. Asr and V. Demberg, Implicitness of Discourse Relations, COLING, 2012.

B. Athiwaratkun and A. G. Wilson, Multimodal word distributions, Conference of the Association for Computational Linguistics (ACL), 2017.

J. L. Austin, How to do things with words, 1962.

J. L. Austin, How to do things with words, 1975.

S. Badene, K. Thompson, J. Lorré, and N. Asher, Data programming for learning discourse structure, Proceedings of the 57th Conference of the Association for Computational Linguistics, pp.640-645, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02393478

M. Baroni, R. Bernardi, and R. Zamparelli, Frege in space: A program of compositional distributional semantics. LiLT (Linguistic Issues in Language Technology, p.9, 2014.

M. Baroni, G. Dinu, and G. Kruszewski, Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.238-247, 2014.

R. Barzilay and M. Lapata, Modeling Local Coherence: An Entity-Based Approach, Computational Linguistics, vol.34, issue.1, pp.1-34, 2008.

P. Baudi?, J. Pichl, T. Vysko?il, and J. And?edivý, Sentence Pair Scoring: Towards Unified Framework for Text Comprehension, 2016.

J. Baxter, A model of inductive bias learning, Journal of artificial intelligence research, vol.12, pp.149-198, 2000.

Y. Belinkov and J. Glass, Analysis methods in neural language processing: A survey, Transactions of the Association for Computational Linguistics, vol.7, pp.49-72, 2019.

E. M. Bender, On achieving and evaluating language-independence in nlp, Linguistic Issues in Language Technology, vol.6, issue.3, pp.1-26, 2011.

S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, vol.43, 2009.

D. M. Blei, A. Y. Ng, J. , and M. I. , Latent dirichlet allocation, Journal of machine Learning research, vol.3, pp.993-1022, 2003.

A. Bordes, X. Glorot, J. Weston, and Y. Bengio, A Semantic Matching Energy Function for Learning with Multi-relational Data, Machine Learning, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00835282

A. Bordes, N. Usunier, J. Weston, Y. , and O. , Translating Embeddings for Modeling Multi-Relational Data, Advances in NIPS, vol.26, pp.2787-2795, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00920777

S. Bowman and X. Zhu, Deep learning for natural language inference, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp.6-8, 2019.

S. R. Bowman, Modeling natural language semantics in learned representations, 2016.

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, A large annotated corpus for learning natural language inference, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.632-642, 2015.

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, A large annotated corpus for learning natural language inference, 2015.

S. R. Bowman, C. D. Manning, and C. Potts, Tree-structured composition in neural networks without tree-structured architectures, Proceedings of the 2015th International Conference on Cognitive Computation: Integrating Neural and Symbolic Approaches, vol.1583, pp.37-42, 2015.

S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz et al., Generating Sentences from a Continuous Space. Iclr, pp.1-13, 2016.

S. Brahma, Unsupervised Learning of Sentence Representations Using Sequence Consistency, 2018.

C. Braud and P. Denis, Learning Connective-based Word Representations for Implicit Discourse Relation Identification, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.203-213, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01397318

A. Bride, T. Van-de-cruys, and N. Asher, A generalisation of lexical functions for composition in distributional semantics, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.281-291, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02355284

S. Buechel and U. Hahn, {E}mo{B}ank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis, Proceedings of the 15th Conference of the {E}uropean Chapter, vol.2, pp.578-585, 2017.

W. Carlile, N. Gurrapadi, Z. Ke, and V. Ng, Give Me More Feedback: Annotating Argument Persuasiveness and Related Attributes in Student Essays, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.621-631, 2018.

L. Carlson, D. Marcu, and M. E. Okurowski, Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory, Proceedings of the Second SIGdial Workshop on Discourse, vol.16, pp.1-10, 2001.

R. Caruana, Multitask learning. Machine learning, vol.28, pp.41-75, 1997.

D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco et al., , 2018.

D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco et al., Universal Sentence Encoder for {E}nglish, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.169-174, 2018.

H. Chang, Z. Wang, L. Vilnis, and A. Mccallum, Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.485-495, 2018.

D. Chen, J. C. Peterson, and T. L. Griffiths, Evaluating vector-space models of analogy, pp.0-5, 2017.

Q. Chen, X. Zhu, Z. Ling, S. Wei, H. Jiang et al., Enhanced LSTM for Natural Language Inference, pp.1657-1668, 2017.

Y. Chen, B. Perozzi, &. Al-rfou, R. Skiena, and S. , The expressive power of word embeddings, ArXiv, 2013.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv, pp.1-9, 2014.

S. Clark, Vector Space Models of Lexical Meaning, vol.16, pp.493-522, 2015.

R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, Proceedings of the 25th international conference on Machine learning, pp.160-167, 2008.

A. Conneau and A. Bordes, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, pp.681-691, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01897968

A. Conneau, G. Kruszewski, G. Lample, L. Barrault, and M. Baroni, What you can cram into a single vector: Probing sentence embeddings for linguistic properties, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.2126-2136, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01898412

A. Conneau, R. Rinott, G. Lample, A. Williams, S. Bowman et al., Xnli: Evaluating cross-lingual sentence representations, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.2475-2485, 2018.

L. Danlos, M. Colinet, and J. Steinlin, FDTB1: Repérage des connecteurs de discours dans un corpus français. Discours -Revue de linguistique, 2015.

D. Das, T. Scheffler, P. Bourgonje, and M. Stede, Constructing a Lexicon of {English} Discourse Connectives, Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pp.360-365, 2018.

I. Dasgupta, D. Guo, A. Stuhlmüller, S. J. Gershman, and N. D. Goodman, Evaluating Compositionality in Sentence Embeddings, 2011.

,. De-marneffe, C. D. Manning, and C. Potts, Did It Happen? The Pragmatic Complexity of Veridicality Assessment, Comput. Linguist, vol.38, issue.2, pp.301-333, 2012.

,. De-marneffe, M. Simons, and J. Tonhauser, The commitmentbank: Investigating projection in naturally occurring discourse, Proceedings of Sinn und Bedeutung, vol.23, pp.107-124, 2019.

G. Deleuze and F. Guattari, A thousand plateaus: Capitalism and schizophrenia, 1988.

D. C. Dennett, The intentional stance, 1989.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, 2019.

B. Dolan, C. Quirk, and C. Brockett, Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources, {COLING} 2004, 20th International Conference on Computational Linguistics, Proceedings of the Conference, pp.23-27, 2004.

R. J. Dolan, Emotion, cognition, and behavior. science, vol.298, pp.1191-1194, 2002.

H. L. Dreyfus, Why Heideggerian AI Failed and How Fixing it Would Require Making it More Heideggerian, Philosophical Psychology, vol.20, issue.2, pp.247-268, 2007.

W. Ferreira and A. Vlachos, Emergent: a novel data-set for stance classification, HLT-NAACL, 2016.

H. Field, Tarski's theory of truth, The Journal of Philosophy, vol.69, issue.13, pp.347-375, 1972.

J. R. Firth, A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis, 1957.

B. Fraser, Pragmatic markers, Pragmatics. Quarterly Publication of the International Pragmatics Association (Ipra), vol.6, issue.2, pp.167-190, 1996.

G. Frege, Grundlagen der Arithmetik, 1884.

R. Fu, J. Guo, B. Qin, W. Che, H. Wang et al., Learning Semantic Hierarchies via Word Embeddings. Acl, pp.1199-1209, 2014.

J. Gauthier and A. Ivanova, Does the brain represent words? An evaluation of brain decoding studies of language understanding, 2018.

L. Getoor and B. Taskar, Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning), 2007.

R. Ghaeini, S. A. Hasan, V. Datla, J. Liu, K. Lee et al., DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.1460-1469, 2018.

M. Glockner, V. Shwartz, and Y. Goldberg, Breaking NLI Systems with Sentences that Require Simple Lexical Inferences, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp.1-6, 2018.

J. J. Godfrey, E. C. Holliman, and J. Mcdaniel, SWITCHBOARD: Telephone Speech Corpus for Research and Development, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, vol.1, pp.517-520, 1992.

Y. Gong, H. Luo, and J. Zhang, Natural Language Inference over Interaction Space, International Conference on Learning Representations, 2018.

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. ;. Mikolov et al., Learning Word Vectors for 157 Languages, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.

M. S. Green, Illocutionary Force and Semantic Content, Linguistics and Philosophy, vol.23, issue.5, pp.435-473, 2000.

H. P. Grice, Logic and Conversation, Syntax and Semantics, vol.3, pp.41-58, 1975.

J. Groenendijk and M. Stokhof, Dynamic predicate logic, Linguistics and philosophy, vol.14, issue.1, pp.39-100, 1991.

S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. R. Bowman et al., Annotation Artifacts in Natural Language Inference Data, 2018.

M. Halliday, An Introduction to Functional Grammar, 1985.

S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, 2001.

S. Hochreiter and J. Schmidhuber, Lstm can solve hard long time lag problems, Advances in neural information processing systems, pp.473-479, 1997.

S. Hooda and L. Kosseim, Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks, 2017.

E. Hovy and E. Maier, Parsimonious or profligate: How many and which discourse structure relations?, 1992.

J. Howard and S. Ruder, Universal language model fine-tuning for text classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.328-339, 2018.

Y. Jernite, S. R. Bowman, and D. Sontag, Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning, 2017.

M. Joshi, E. Choi, O. Levy, D. S. Weld, and L. Zettlemoyer, pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference, NAACL, 2019.

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, Bag of tricks for efficient text classification, Proceedings of the 15th Conference of the European Chapter, vol.2, pp.427-431, 2017.

H. Kamp, J. Van-genabith, R. , and U. , Discourse representation theory, Handbook of philosophical logic, pp.125-394, 2011.

L. Karttunen, Presupposition and linguistic context, Theoretical linguistics, vol.1, issue.1-3, pp.181-194, 1974.

A. Kaur and V. Gupta, A survey on sentiment analysis and opinion mining techniques, Journal of Emerging Technologies in Web Intelligence, vol.5, issue.4, pp.367-371, 2013.

D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, International Conference on Learning Representations, pp.1-13, 2014.

J. Kiros and W. Chan, {I}nfer{L}ite: Simple Universal Sentence Representations from Natural Language Inference Data, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.4868-4874, 2018.

R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun et al., Skip-thought vectors, Advances in neural information processing systems, pp.3294-3302, 2015.

A. Knott, A data-driven methodology for motivating a set of coherence relations, 1996.

J. F. Kroll, Comprehension and memory in rapid sequential reading. Attention and performance VIII, vol.8, p.395, 1980.

T. Lacroix, N. Usunier, and G. Obozinski, Canonical Tensor Decomposition for Knowledge Base Completion, ICML, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01817595

S. Lahiri, SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and Implicature, 2015.

A. Lascarides and N. Asher, Segmented discourse representation theory: Dynamic semantics with discourse structure, Computing meaning, pp.87-124, 2008.

S. Läubli, R. Sennrich, and M. Volk, Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.4791-4796, 2018.

Q. Le and T. Mikolov, Distributed Representations of Sentences and Documents, International Conference on Machine Learning -ICML 2014, vol.32, pp.1188-1196, 2014.

H. Levesque, E. Davis, and L. Morgenstern, The winograd schema challenge, Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2012.

O. Levy and Y. Goldberg, Linguistic Regularities in Sparse and Explicit Word Representations, Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pp.171-180, 2014.

O. Levy, Y. ;. Goldberg, M. Welling, C. Cortes, N. D. Lawrence et al., Neural word embedding as implicit matrix factorization, Advances in Neural Information Processing Systems, vol.27, pp.2177-2185, 2014.

O. Levy, S. Remus, C. Biemann, and I. Dagan, Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Naacl-2015, pp.970-976, 2015.

H. Liu, Y. Wu, Y. , and Y. , Analogical Inference for Multi-Relational Embeddings, 2017.

X. Liu, P. He, W. Chen, and J. Gao, Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding, 2019.

Y. P. Liu, S. Li, X. Zhang, and Z. Sui, Implicit discourse relation classification via multitask neural networks, ArXiv, 2016.

L. Logeswaran, H. Lee, and D. Radev, Sentence Ordering using Recurrent Neural Networks, pp.1-15, 2016.

L. Logeswaran, H. Lee, and D. R. Radev, Sentence Ordering and Coherence Modeling using Recurrent Neural Networks, Proceedings of the Thirty-Second {AAAI} Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th {AAAI} Symposium on Educational Advances in Artificial Intelligence (EAAI-18), pp.5285-5292, 2018.

E. Malmi, D. Pighin, S. Krause, and M. Kozhevnikov, Automatic Prediction of Discourse Connectives, Proceedings of the 11th Language Resources and Evaluation Conference, 2018.

W. C. Mann and S. A. Thompson, Rhetorical structure theory: Toward a functional theory of text organization, Text-interdisciplinary Journal for the Study of Discourse, vol.8, issue.3, pp.243-281, 1988.

C. Mantzavinos, Hermeneutics, 2016.

D. Marcu and A. Echihabi, An unsupervised approach to recognizing discourse relations, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp.368-375, 2002.

C. May, A. Wang, S. Bordia, S. R. Bowman, and R. Rudinger, On measuring social biases in sentence encoders, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.622-628, 2019.

B. Mccann, N. S. Keskar, C. Xiong, and R. Socher, The natural language decathlon: Multitask learning as question answering, 2018.

R. T. Mccoy, E. Pavlick, and T. Linzen, Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, ACL, 2019.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Distributed Representations of Words and Phrases and their Compositionality. Nips, pp.1-9, 2013.

J. Mitchell and M. Lapata, Vector-based models of semantic composition, Proceedings of ACL-08: HLT, pp.236-244, 2008.

M. Morey, P. Muller, and N. Asher, How much progress have we made on RST discourse parsing? a replication study of recent results on the RST-DT, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.1319-1324, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01650251

L. Mou, R. Men, G. Li, Y. Xu, L. Zhang et al., Natural Language Inference by Tree-Based Convolution and Heuristic Matching, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.130-136, 2016.

J. Mueller and A. Thyagarajan, Siamese Recurrent Architectures for Learning Sentence Similarity, AAAI, pp.2786-2792, 2016.

N. Nandakumar, T. Baldwin, and B. Salehi, How well do embedding models capture noncompositionality? a view from multiword expressions, Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pp.27-34, 2019.

N. Nangia and S. R. Bowman, Human vs. muppet: A conservative estimate of human performance on the GLUE benchmark, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.4566-4575, 2019.

M. Nickel, V. Tresp, and H. Kriegel, A Three-Way Model for Collective Learning on Multi-Relational Data. Icml, pp.809-816, 2011.

A. Nie, E. Bennett, and N. Goodman, DisSent: Learning sentence representations from explicit discourse relations, pp.4497-4510, 2019.

T. Niven and H. Kao, Probing neural network comprehension of natural language arguments, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.4658-4664, 2019.

S. Oraby, V. Harrison, L. Reed, E. Hernandez, E. Riloff et al., Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue, Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.31-41, 2016.

B. Pan, Y. Yang, Z. Zhao, Y. Zhuang, D. Cai et al., Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.989-999, 2018.

B. Pan, Y. Yang, Z. Zhao, Y. Zhuang, D. Cai et al., Discourse marker augmented network with reinforcement learning for natural language inference, ACL, 2018.

A. Panchenko, E. Ruppert, S. Faralli, S. P. Ponzetto, and C. Biemann, Building a Web-Scale Dependency-Parsed Corpus from Common Crawl, pp.1816-1823, 2017.

B. Pang and L. Lee, A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd annual meeting on Association for Computational Linguistics, p.271, 2004.

J. Park and C. Cardie, Identifying appropriate support for propositions in online user comments, Proceedings of the first workshop on argumentation mining, pp.29-38, 2014.

R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, How to construct deep recurrent neural networks, Proceedings of the Second International Conference on Learning Representations, 2014.

J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1543, 2014.

C. S. Perone, R. Silveira, P. , and T. S. , Evaluation of sentence embeddings in downstream and linguistic probing tasks, 2018.

M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.2227-2237, 2018.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proc. of NAACL, 2018.

J. Pfeffer, Understanding the role of power in decision making, Classics of Organization Theory, pp.137-154, 1981.

J. Phang, T. Févry, and S. R. Bowman, Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks. CoRR, 2018.

R. Piedeleu, D. Kartsaklis, B. Coecke, and M. Sadrzadeh, Open system categorical quantum semantics in natural language processing, 2015.

E. Pitler, A. Louis, and A. Nenkova, Automatic sense prediction for implicit discourse relations in text, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol.2, pp.683-691, 2009.

E. Pitler and A. Nenkova, Using Syntax to Disambiguate Explicit Discourse Connectives in Text, {ACL} 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp.13-16, 2009.

E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee et al., Easily Identifiable Discourse Relations, Coling 2008: Companion volume: Posters, pp.87-90, 2008.

E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee et al., Easily identifiable discourse relations, p.884, 2008.

A. Poliak, A. Haldar, R. Rudinger, J. E. Hu, E. Pavlick et al., Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.67-81, 2018.

A. Poliak, J. Naradowsky, A. Haldar, R. Rudinger, and B. V. Durme, Hypothesis Only Baselines in Natural Language Inference, Proceedings of the 7th Joint Conference on Lexical and Computational Semantics, pp.180-191, 2018.

R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo et al., The Penn Discourse TreeBank 2.0, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), 2008.

R. Prasad, K. F. Riley, and A. Lee, Towards Full Text Shallow Discourse Relation Annotation : Experiments with Cross-Paragraph, 2009.

J. Pustejovsky, Co-compositionality in grammar. The Oxford handbook of compositionality, vol.371, p.382, 2012.

J. Pérez, J. Marinkovi?, and P. Barceló, On the turing completeness of modern neural network architectures, International Conference on Learning Representations, 2019.

A. Radford, Improving Language Understanding by Generative Pre-Training, 2018.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei et al., Language models are unsupervised multitask learners, OpenAI Blog, issue.8, p.1, 2019.

M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. S. Dickstein, On the expressive power of deep neural networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2847-2854, 2017.

E. Ribeiro, R. Ribeiro, and D. M. De-matos, The influence of context on dialogue act recognition, 2015.

T. Rocktäschel, E. Grefenstette, K. M. Hermann, T. Kociský, and P. Blunsom, Reasoning about entailment with neural attention, 2015.

C. Roze, L. Danlos, and P. Muller, LEXCONN: A French Lexicon of Discourse Connectives, Discours, issue.10, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00511615

A. Rutherford and N. Xue, Improving the Inference of Implicit Discourse Relations via Classifying Explicit Discourse Connectives, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.799-808, 2015.

M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, issue.11, pp.2673-2681, 1997.

J. Searle, Chinese room argument, the. Encyclopedia of cognitive science, 2006.

J. R. Searle, F. Kiefer, M. Bierwisch, and O. , Speech act theory and pragmatics, vol.10, 1980.

J. R. Searle and S. Willis, Intentionality: An essay in the philosophy of mind, 1983.

M. J. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, Bidirectional Attention Flow for Machine Comprehension, 5th International Conference on Learning Representations, 2017.

M. Seok, H. Song, C. Park, J. Kim, and Y. Kim, Named Entity Recognition using Word Embedding as a Feature 1, International Journal of Software Engineering and Its Applications, vol.10, issue.2, pp.93-104, 2016.

D. Shen, G. Wang, W. Wang, M. Renqiang-min, Q. Su et al., Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms, ACL, 2018.

E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, The ICSI meeting recorder dialog act (MRDA) corpus, Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL, 2004.

V. Shwartz and I. Dagan, Still a pain in the neck: Evaluating text representations on lexical composition, Transactions of the Association for Computational Linguistics, vol.7, pp.403-419, 2019.

D. Sileo, T. Van-de-cruys, C. Pradel, and P. Muller, Mining discourse markers for unsupervised sentence representation learning, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.3477-3486, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02397473

R. Socher, D. Chen, C. Manning, D. Chen, and A. Ng, Reasoning With Neural Tensor Networks for Knowledge Base Completion, Neural Information Processing Systems, pp.926-934, 2003.

C. Sporleder and A. Lascarides, Using automatically labelled examples to classify rhetorical relations: an assessment, Natural Language Engineering, vol.14, issue.3, pp.369-416, 2008.

M. Stede, Di{M}{L}ex: A Lexical Approach to Discourse Markers, Exploring the Lexicon -Theory and Computation. Edizioni dell'Orso, 2002.

S. Subramanian, A. Trischler, Y. Bengio, and C. J. Pal, Learning general purpose distributed sentence representations via large scale multi-task learning, International Conference on Learning Representations, 2018.

Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian et al., Ernie 2.0: A continual pre-training framework for language understanding, 2019.

Z. G. Szabó, Compositionality, 2017.

K. S. Tai, R. Socher, and C. D. Manning, Improved semantic representations from treestructured long short-term memory networks, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, vol.1, pp.1556-1566, 2015.

Y. Tamaazousti, On The Universality of Visual and Multimodal Representations, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01828934

Y. Tay, A. T. Luu, and S. C. Hui, Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.1565-1575, 2018.

F. Torabi-asr, R. Zinkov, and M. Jones, Querying word embeddings for similarity and relatedness, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.675-684, 2018.

T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard, Complex Embeddings for Simple Link Prediction, Proceedings of the 33nd International Conference on Machine Learning, vol.48, 2016.

T. Van-de-cruys, Mining for Meaning. The Extraction of Lexico-Semantic Knowledge from Text, 2010.

L. Van-der-maaten and G. Hinton, Visualizing Data using {t-SNE}, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Advances in neural information processing systems, pp.5998-6008, 2017.

L. Vilnis and A. Mccallum, Word Representations via Gaussian Embedding. Iclr, p.12, 2015.

A. Wang, J. Hula, P. Xia, R. Pappagari, R. T. Mccoy et al., , 2019.

, Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling, 2019.

A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael et al., SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems, 2019.

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy et al., {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}, pp.353-355, 2018.

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy et al., {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, International Conference on Learning Representations, 2019.

A. Wang, I. F. Tenney, Y. Pruksachatkun, K. Yu, J. Hula et al., , 2019.

S. Watanabe, Theorem of the ugly duckling, Pattern Recognition: Human and Mechanical, 1985.

S. Wermter, E. Riloff, and G. Scheler, Connectionist, statistical and symbolic approaches to learning for natural language processing, vol.1040, 1996.

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, Towards Universal Paraphrastic Sentence Embeddings. CoRR, 2015.

A. Williams, N. Nangia, and S. Bowman, A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.1112-1122, 2018.

A. Williams, N. Nangia, and S. Bowman, A broad-coverage challenge corpus for sentence understanding through inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.1112-1122, 2018.

L. Wittgenstein, Tractatus logico-philosophicus, 2013.

M. J. Wolf, K. Miller, and F. S. Grodzinsky, Why we should have seen that coming: comments on microsoft's tay experiment, and wider implications, ACM SIGCAS Computers and Society, vol.47, issue.3, pp.54-64, 2017.

N. Xue, V. Demberg, R. , and A. , A Systematic Study of Neural Discourse Models for Implicit Discourse Relation, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol.1, pp.281-291, 2017.

M. C. Yang, D. G. Lee, S. Y. Park, and H. C. Rim, Knowledge-based question answering using the semantic embedding space, Expert Systems with Applications, vol.42, issue.23, pp.9086-9104, 2015.

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov et al., XLNet: Generalized Autoregressive Pretraining for Language Understanding, 2019.

T. Young, D. Hazarika, S. Poria, and E. Cambria, Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, vol.13, pp.55-75, 2018.

A. Zeldes, The GUM corpus: Creating multilayer resources in the classroom, Language Resources and Evaluation, vol.51, issue.3, pp.581-612, 2017.

D. Zeyrek and B. Webber, A Discourse Resource for Turkish: Annotating Discourse Connectives in the {METU} Corpus, Proceedings of IJCNLP, 2008.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

Y. Zhou, J. Lu, J. Zhang, and N. Xue, Chinese Discourse Treebank 0, vol.5, pp.2014-2035, 2014.