, Hindi: 163221 Hindi word forms from 68788 lemmas

, Marathi: 72312 Marathi word forms from 6026 lemmas

, IV.2.2.1.3 Bilingual lexical resources We obtained the as a bilingual resource

, HI-UNL dictionary from CFILT-IITB with 136710 Universal words and 65156 lemmas

, Hindi-English Shabdkosh dictionary with 22756 entries

, Hindi-English Apertium dictionary with 30463 entries

, Available resources for tweets in Japanese IV.2.2.2.1 Tweet resources For Japanese tweets, we use a collection of 3, IV.2.2.22M tweets collected by Prof. Kitamoto (NII, Tokyo) during the whole of, 2014.

, We used samples from this collection to make evaluation experiments on Japanese tweets within SUFT-1

, IV.2.2.2.2 Lexical resources for Japanese-French and Japanese-English Resources available on the JIBIKI/PAPILLON platform 80 have been used for SUFT-1

, Japanese-French: The CESSELIN dictionary, with 82K entries

. Japanese-english, The JMDICT dictionary, with 48K entries

, Bibliography

S. Alansary, M. Nagi, and N. Adly, The universal networking language in action in English-Arabic machine translation Retrieved from http, Proceedings of 9th Egyptian Society of Language Engineering Conference on Language Engineering, pp.23-24, 2009.

P. André, M. S. Bernstein, and K. Luther, Who Gives A Tweet? Evaluating Microblog Content Value, Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW 2012), pp.471-474, 2012.

Z. Bahramiana, A. Abbaspoura, and R. , AN ONTOLOGY-BASED TOURISM RECOMMENDER SYSTEM BASED ON SPREADING ACTIVATION MODEL, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences -ISPRS Archives, pp.83-90, 2015.
DOI : 10.5194/isprsarchives-XL-1-W5-83-2015

M. Bapat, H. Gune, and P. Bhattacharyya, A Paradigm-Based Finite State Morphological Analyzer for Marathi Beijing. Retrieved from https, Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), colloc. with COLING 2010, pp.26-34, 2010.

S. Bergsma, P. Mcnamee, M. Bagdouri, C. Fink, and T. Wilson, Language Identification for Creating Language-Specific Twitter Collections, Proceedings of the 2012 Workshop on Language in Social Media (LSM 2012), pp.65-74, 2012.

P. Bhattacharyya, Facilitating Multi-Lingual Sense Annotation : Human Mediated Lemmatizer, Global Wordnet Conference, 2014.

T. Bögel, M. Butt, A. Hautli, and S. Sulger, Developing a finite-state morphological analyzer for Urdu and Hindi. Finite State Methods and Natural Language Processing, 10. Retrieved from http, 2007.

C. Boitet, GETA's MT methodology and its current development towards personal networking communication and speech translation in the context of the UNL and C-STAR projects Retrieved from http, Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING), p.35, 1997.

C. Boitet, V. Bellynck, M. Mangeot, and C. Ramisch, Towards Higher Quality Internal and Outside Multilingualization of Web Sites, Proceedings of the Summer Workshop on Ontology, NLP, Personalization and IE/IR (ONII-08) (p. 8), 2008.
URL : https://hal.archives-ouvertes.fr/hal-00968752

C. Boitet, H. Blanchon, M. Seligman, and V. Bellynck, Evolution of MT with the Web, Proceedings of the International Conference " Machine Translation 25 Years On, pp.1-13, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00959252

C. Boitet, M. Mangeot, and G. Sérasset, The PAPILLON project, Proceedings of the 2nd workshop on NLP and XML , NLPXML '02, pp.93-96126, 2002.
DOI : 10.3115/1118808.1118813

URL : https://hal.archives-ouvertes.fr/hal-00965824

S. Bostandjiev, J. O. Donovan, and T. Höllerer, TasteWeights, Proceedings of the sixth ACM conference on Recommender systems, RecSys '12, pp.35-42, 2012.
DOI : 10.1145/2365952.2365964

R. Burke, Hybrid web recommender systems. The Adaptive Web, pp.377-408, 2007.
DOI : 10.1007/978-3-540-72079-9_12

URL : http://www.inf.unibz.it/~ricci/ISR/papers/burke07.pdf

J. Chandioux, 10 ans de METEO (MD) Proceedings of Traduction Assistée par Ordinateur: Perspectives technologiques, industrielles et économiques envisageables à l'Horizon 1990: l'offre, la demande, les marchés et les évolutions en cours, pp.169-172, 1989.

A. Chatterjee, S. R. Joshi, M. M. Khapra, and P. Bhattacharyya, Introduction to Tools for IndoWordNet and Word Sense Disambiguation, Proceedings of 3rd IndoWordNet workshop, 2010.

J. Chauché, The ATEF and CETA systems, American Journal of Computational Linguistics, pp.17-21, 1975.

J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi, Short and tweet, Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, pp.1185-1194, 2010.
DOI : 10.1145/1753326.1753503

Y. Chen, L. Wang, C. Boitet, and X. Shi, On-going Cooperative Research towards Developing Economy-Oriented Chinese-French SMT Systems with a New SMT Framework, Proceedings of 21st Traitement Automatique des Langues Naturelles, TALN, pp.401-406, 2014.

F. Marseille,

G. Chittaranjan, Y. Vyas, K. Bali, and M. Choudhury, Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System, Proceedings of the First Workshop on Computational Approaches to Code Switching, pp.73-79, 2014.
DOI : 10.3115/v1/W14-3908

R. Dabre, A. Amberkar, and P. Bhattacharyya, Morphological Analyzer for Affix Stacking Languages: A Case Study of Marathi Retrieved from https, Proceedings of the 24th International Conference on Computational Linguistics, 2012.

M. Daoud, Usage of non-conventional resources and contributive methods to bridge the terminological gap between languages by developing multilingual " preterminologies, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00583682

H. Depraetere and J. Van-de-walle, Bologna Translation Service: An enabler for international profiling and student mobility, Proceedings of the 6th International Technology, Education and Development Conference, pp.5907-5912, 2012.

H. Depraetere, J. Van-den-bogaert, and J. Van-de-walle, Bologna Translation Service: Online translation of course syllabi and study programmes in English, Proceedings of the 15th Conference of the European Association for Machine Translation, pp.29-34, 2011.

L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, Twitter part-of-speech tagging for all: Overcoming sparse and noisy data, Proceedings of the Recent Advances in Natural Language Processing, pp.198-206, 2013.

U. Desai and M. Ramsay-brijball, Tracing Gujarati Language Development Philologically and Sociolinguistically. Alternation: International Journal for the Study of Southern African Literature and Languages, pp.308-324126, 2004.

R. G. Devi, P. V. Veena, A. M. Kumar, and K. P. Soman, AMRITA_CEN@FIRE 2016: Code-Mix Entity Extraction for Hindi-English and Tamil-English Tweets, Forum for Information Retrieval and Evaluation, p.5, 2016.

A. Dey and P. Fung, A Hindi-English Code-Switching Corpus Code-Switching in Indian Culture, Proceedings of the 9th Language Resources and Evaluation Conference, pp.2410-2413, 2014.

M. D. Ekstrand, J. T. Riedl, and J. A. Konstan, Collaborative Filtering Recommender Systems, Foundations and Trends® in Human?Computer Interaction, pp.81-173, 2010.
DOI : 10.1561/1100000009

A. Falaise, D. Rouquet, D. Schwab, H. Blanchon, and C. Boitet, Ontology driven content extraction using interlingual annotation of texts in the OMNIA project, Proceedings of the 4th International Workshop on Cross-Lingual Information Access at COLING'10, pp.52-60, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00959171

A. Farzindar and D. Inkpen, Natural Language Processing for Social Media. (Graeme Hirst Morgan and Claypool, Natural Language Processing for Social Media, 2015.

T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau et al., Annotating Named Entities in Twitter Data with Crowdsourcing Retrieved from http, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pp.80-88, 2010.

M. Fleischman and E. Hovy, Recommendations without user preferences, Proceedings of the 8th international conference on Intelligent user interfaces, IUI '03, pp.242-244, 2003.
DOI : 10.1145/604045.604087

G. Francopoulo, N. Bel, M. George, N. Calzolari, M. Monachini et al., Multilingual resources for NLP in the lexical markup framework (LMF), Language Resources and Evaluation, vol.13, issue.4, pp.57-70, 2009.
DOI : 10.1007/s10579-008-9077-5

B. Gambäck, G. Eriksson, and A. Fourla, Natural language processing at the school of information studies for Africa, Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, TeachNLP '05, 2005.
DOI : 10.3115/1627291.1627303

M. Gauthier, A. Guille, F. Rico, and A. Deseille, Text mining and Twitter to analyze British swearing habits Retrieved from https, Proceedings of International Conference on Twitter for Research, pp.28-42, 2015.

K. Gimpel, N. Schneider, B. O-'connor, D. Das, D. Mills et al., Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Shortpapers, pp.42-47, 2011.
DOI : 10.21236/ADA547371

J. Goldsmith, Unsupervised Learning of the Morphology of a Natural Language, Computational Linguistics, vol.16, issue.1, pp.153-198, 2001.
DOI : 10.1016/0306-4573(78)90067-5

J. Goldsmith, An algorithm for the unsupervised learning of morphology, Natural Language Engineering, vol.12, issue.04, 2006.
DOI : 10.1017/S1351324905004055

F. Gotti, P. Langlais, and A. Farzindar, Translating Government Agencies' Tweet Feeds: Specificities, Problems and (a few) Solutions, pp.80-89, 2013.

, BIBLIOGRAPHY, vol.104, p.126

F. Gotti, P. Langlais, and A. Farzindar, Hashtag Occurrences, Layout and Translation: A Corpus-driven Analysis of Tweets Published by the Canadian Government Retrieved from http, Proceedings of the 9th Language Resources and Evaluation Conference, pp.2254-2261, 2014.

P. Goyal, M. R. Mital, A. Mukerjee, A. M. Raina, D. Sharma et al., A bilingual parser for Hindi, English and code-switching structures Retrieved from https, Proceedings of the European Association for Computational LInguistics, p.15, 2003.

S. Green, J. Heer, and C. D. Manning, The efficacy of human post-editing for language translation, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '13, pp.439-448, 2013.
DOI : 10.1145/2470654.2470718

J. Guilbaud, C. Boitet, and V. Berment, Un analyseur morphologique étendu de l'allemand traitant les formes verbales à particule séparée, Proceedings of TALN-RÉCITAL, pp.755-763, 2013.

N. Y. Habash, Introduction to Arabic Natural Language Processing Toronto: Morgan and Claypool, Synthesis Lectures on Human Language Technologies, vol.310, pp.1-187, 2010.

B. Han, P. Cook, P. E. Au, and T. Baldwin, Text-Based Twitter User Geolocation Prediction, Journal of Artificial Intelligence Research, vol.49, pp.451-500, 2014.
DOI : 10.1613/jair.4200

J. Hannon, M. Bennett, and B. Smyth, Recommending twitter users to follow using content and collaborative filtering approaches, Proceedings of the fourth ACM conference on Recommender systems, RecSys '10, pp.199-206, 2010.
DOI : 10.1145/1864708.1864746

B. Harris, The Importance of Natural Translation Toronto. Retrieved from https, Working Papers in Bilingualism, pp.96-114, 1976.

B. Harris and T. Hofmann, Pidgin Translation, Meta, vol.15, issue.2, pp.71-87, 1970.

C. Huynh, Des suites de test pour la TA à un système d'exploitation de corpus alignés de documents et métadocuments multilingues, multiannotés et multimédia. (Doctoral dissertation, Université Grenoble-Alpes) Retrieved from https, 2010.

C. Huynh, C. Boitet, and H. Blanchon, SECTra_w.1: An online collaborative system for evaluating, post-editing and presenting MT translation corpora Retrieved from http, Proceedings of the Sixth International Conference on Language Resources and Evaluation, pp.2571-2576, 2008.

P. Isabelle, Machine Translation at the TAUM group, Proceedings of Machine Translation Today: The State of the Art (Margaret K, pp.247-277, 1987.

H. Isahara, JEIDA's test-sets for quality evaluation of MT systems?technical evaluation from the developer's point of view Retrieved from http, Proceedings of MT Summit V, 1995.

L. E. Jehl, Machine Translation for Twitter, p.5317, 2010.

L. Jehl, F. Hieber, and S. Riezler, Twitter Translation using Translation-Based Cross- Lingual Retrieval, pp.410-421, 2012.

, BIBLIOGRAPHY, vol.105, p.126

R. Kalitvianski, C. Boitet, and V. Bellynck, Collaborative computer-assisted translation applied to pedagogical documents and literary works, Proceedings of the 24th International Conference on Computational Linguistics, pp.255-260, 2012.

S. Kim, I. Weber, L. Wei, and A. Oh, Sociolinguistic analysis of Twitter in multilingual societies, Proceedings of the 25th ACM conference on Hypertext and social media, HT '14, pp.243-248, 2014.
DOI : 10.1145/2631775.2631824

A. Krishn, R. S. Guha, and A. Mukherjee, Unsupervised Morphological Analysis of Hindi, Indian Institute of Kanpur), 2012.

G. H. Kumar and T. Seyedmahmoud, Movie Recommendation based on Users ' Tweets, In International Journal of Computer Applications, vol.141, pp.34-36, 2016.

S. M. Kywe, E. Lim, and F. Zhu, A Survey of Recommender Systems in Twitter, Proceedings of the Research Collection School Of Information Systems, pp.420-433978, 2012.
DOI : 10.1007/978-3-642-35386-4_31

L. Gavrilova, Analysis of Users' Interest Based on Tweets, Computational Science and Its Applications, vol.1, pp.354978-354981, 2006.

M. Lafourcade and J. Chauché, Ficus -un agent dictionnaire coopératif et extensible, NLP+IA'98, 1998.

J. B. Larsen, Content-based Recommender Systems. (Doctoral dissertation, 2013.

W. Ling, G. Xiang, C. Dyer, A. Black, and I. Trancoso, Microblogs as Parallel Corpora, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp.176-186, 2013.

P. Lops, M. Gemmis, . De, and G. Semeraro, Content-based Recommender Systems: State of the Art and Trends. Recommender Systems Handbook, pp.73-105, 2011.

M. Lui and T. Baldwin, Accurate Language Identification of Twitter Messages, Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), pp.17-25, 2014.
DOI : 10.3115/v1/W14-1303

J. Mahmud, J. Nichols, and C. Drews, Where Is This Tweet From? Inferring Home Locations of Twitter Users, Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media Where, pp.511-514, 2012.

J. Mahmud, J. Nichols, and C. Drews, Home Location Identification of Twitter Users, ACM Transactions on Intelligent Systems and Technology, vol.5, issue.3, pp.47-69, 2014.
DOI : 10.1007/BF00058655

A. Malik, C. Boitet, and P. Bhattacharyya, Hindi Urdu machine transliteration using finite-state transducers, Proceedings of the 22nd International Conference on Computational Linguistics, COLING '08, pp.537-544, 2008.
DOI : 10.3115/1599081.1599149

URL : https://hal.archives-ouvertes.fr/hal-01002349

J. Malmivuo and R. Plonsey, Bioelectromagnetism: principles and applications of bioelectric and biomagnetic fields, 1995.

M. Mangeot-nagata, Collaborative Construction of a Good Quality, Broad Coverage and Copyright Free Japanese-French Dictionary, Internation Journal of Lexicography, vol.30, p.126, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01712271

R. Mehrotra, S. Sanner, W. Buntine, and L. Xie, Improving LDA topic models for microblogs via tweet pooling and automatic labeling, Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '13, pp.889-892, 2013.
DOI : 10.1145/2484028.2484166

P. Melville, R. J. Mooney, and R. Nagarajan, Content-boosted collaborative filtering for improved recommendations, Proceedings of the 18th National Conference on Artificial Intelligence (AAAI), pp.187-192, 2002.

R. Mesthrie, Indian Languages in Africa: a field report, 1900-1995. Yearbook of South Asian Languages, 1997.

S. Mukherjee, A. Malu, A. R. Balamurali, and P. Bhattacharyya, TwiSent, Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM '12, pp.2531-2534, 2012.
DOI : 10.1145/2396761.2398684

M. Nagao, J. Tsujii, and J. Nakamura, The Japanese Government Project for Machine Translation, Computational Linguistics, vol.11, issue.23, pp.91-110, 1985.

V. Nastase and M. Strube, Decoding Wikipedia Categories for Knowledge Acquisition, Proceedings of the 23rd national conference on Artificial intelligence, pp.1219-1224, 2008.

A. Nayan, B. R. Rao, P. Singh, S. Sanyal, and R. Sanyal, Named Entity Recognition for Indian Languages, Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp.97-104, 2008.

G. Neubig and K. Duh, How Much Is Said in a Tweet? A Multilingual, Informationtheoretic Perspective Retrieved from http, AAAI Spring Symposium: Analyzing Microtext, pp.32-395906, 2013.

D. P. Nguyen and M. Ishizuka, A statistical approach for universal networking language-based relation extraction, 2006 International Conference onResearch, Innovation and Vision for the Future, pp.153-1601696432, 2006.
DOI : 10.1109/RIVF.2006.1696432

H. Nguyen, C. Boitet, and G. Sérasset, PIVAX, an online contributive lexical database for heterogeneous MT systems using a lexical pivot, Proceedings of the International Symposium on Natural Language Processing, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00965463

M. Nguyen, A. Kitamoto, and T. Nguyen, TSum4act: A Framework for Retrieving and Summarizing Actionable Tweets During a Disaster for Reaction, The 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.978-981, 2015.
DOI : 10.1007/978-3-319-18032-8_6

B. Nikolova and I. Nenova, An Automated System for Term Services, Proceedings of the 9th Conference on Computational Linguistics, pp.265-270, 1982.

H. Nomura and H. Isahara, Evaluation Surveys: The JEIDA Methodology and Survey. In MT Evaluation: Basis for Future Directions Retrieved from http, Proceedings of a workshop sponsored by the National Science Foundation, pp.11-12, 1992.

N. Oostdijk, MapTwitter tribal language(s) Retrieved from https, Proceedings of International Conference on Twitter for Research, pp.65-87126, 2015.

A. Pak and P. Paroubek, Twitter as a Corpus for Sentiment Analysis and Opinion Mining, Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp.1320-1326, 2010.

D. Park, H. Kim, I. Choi, and J. Kim, A Literature Review and Classification of Recommender Systems on Academic Journals, Expert Systems with Applications, vol.39, issue.11, pp.139-152, 2012.

M. Patawar and M. Potey, Named Entity Recognition from Indian tweets using Conditional Random Fields based Approach, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol.5, issue.5, p.5, 2016.

R. B. Patel, Etymological and Phonetic Changes among Foreign words in Kiswahili, Journal of the Institute of Swahili Research, vol.37, issue.1, p.6, 1967.

M. Pennacchiotti and A. Popescu, A Machine Learning Approach to Twitter User Classification Retrieved from http, Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pp.281-2883262, 2011.

J. Pietrzak, A. Jauregi, J. Van-de-walle, and A. Eriksson, Improving access to educational courses via automatic machine translation -New developments in post-editing, Proceedings of the INTED 2013 Conference, pp.5521-5529, 2013.

B. Pouliquen, C. Elizalde, M. Junczys-dowmunt, C. Mazenc, and J. García-verdugo, , 2013.

, Large-scale multiple language translation accelerator at the United Nations, Proceedings of the 14th Machine Translation Summit, pp.345-352

B. Pouliquen and C. Mazenc, Automatic translation tools at WIPO. ASLIB, Translating and the Computer, pp.17-18, 2011.

B. Pouliquen and C. Mazenc, COPPA, CLIR and TAPTA: Three tools to assist in overcoming the patent language barrier at WIPO, Proceedings of the 13th Machine Translation Summit, pp.24-30, 2011.

J. Probyn, Study sheds light on how Africans use Twitter. Retrieved from http://www.howwemadeitinafrica.com/study-sheds-light-africans-use-twitter, 2016.

D. Ramage, S. Dumais, and D. Liebling, Characterizing Microblogs with Topic Models, Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media Characterizing, pp.1-8, 2010.

C. Ramisch, A. Villavicencio, and C. Boitet, Web-based and combined language models: a case study on noun compound identification, Proceedings of the 23rd International Conference on Computational Linguistics, pp.1041-1049, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01002431

A. Ritter, S. Clark, and O. Etzioni, Named entity recognition in tweets: an experimental study. ? of the Conference on Empirical Methods ?, pp.1524-1534, 2011.

L. Rolinger, Edible Identities: Food, Cultural Mixing and the Making of Identities on the Swahili coast, 2009.

D. Rouquet and H. Nguyen, Interlingual annotation of texts in the OMNIA project, Proceedings of the 4th Language and Technology Conference, pp.290-294, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00959171

S. K. Saha, S. Sarkar, and P. Mitra, Gazetteer preparation for named entity recognition in Indian languages Retrieved from http, 6th Workshop on Asian Language Resources collocated with International Joint Conference on Natural Language Processing, pp.9-16, 2008.

I. Saito, K. Sadamitsu, H. Asano, and Y. Matsuo, Morphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization Retrieved from http, Proceedings of the 25th International Conference on Computational Linguistics (COLING): Technical Papers, pp.1773-1782, 2014.

B. Sankaran, B. Sankaran, K. Bali, K. Bali, M. Choudhury et al., A Common Parts-of-Speech Tagset Framework for Indian Languages, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), pp.1331-1337, 2008.

R. Sasano, S. Kurohashi, and M. Okumura, A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis, Journal of Natural Language Processing, vol.21, issue.6, pp.1183-1205, 2014.
DOI : 10.5715/jnlp.21.1183

K. P. Scannell, The Crúbadán Project: Corpus building for under-resourced languages Retrieved from http, Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, pp.5-15, 2007.

M. Seligman, C. Boitet, and B. Meddeb-hamrouni, Transforming lattices into nondeterministic automata with optional null arcs, Proceedings of the 36th annual meeting on Association for Computational Linguistics, pp.1205-1211, 1998.

R. Shah, Resource collection in the form of dictionaries, named-entities and tweets in Hindi and Marathi: Experiments for building a recommender using tweets, 2016.

R. Shah and C. Boitet, Understandability of machine-translated Hindi tweets before and after post-editing: Perspectives for a recommender system, CEUR Workshop Proceedings, pp.1445-1489, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02014320

R. Shah, C. Boitet, and P. Bhattacharyya, Building a recommender system using multilingual multiscript tweets, Proceedings of the International Conference on Twitter Research (CTR), pp.160-170, 2015.

R. Sharma, M. Gupta, A. Agarwal, and P. Bhattacharyya, Adjective Intensity and Sentiment Analysis, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1300

A. Si, A diachronic investigation of Hindi???English code-switching, using Bollywood film scripts, International Journal of Bilingualism, vol.56, issue.4, pp.388-407, 2011.
DOI : 10.1525/aa.1961.63.5.02a00060

S. Singh and V. Sarma, Verbal Inflection in Hindi: A Distributed Morphology Approach, Proceedings of 25th Pacific Asia Conference on Language, Information and Computation, pp.283-292, 2011.

A. Tchechmedjiev, J. Goulian, D. Schwab, and G. Sérasset, Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based probabilistic WSD algorithm Retrieved from http, Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology, pp.109-124, 2012.

H. Terdalkar and S. Agarwal, Romanagari Detection in Twitter, pp.1-13, 2015.

J. Tiedemann, Parallel Data, Tools and Interfaces in OPUS Retrieved from http, Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), pp.2214-2218, 2012.

J. Tiedemann and L. Nygaard, OPUS -an open source parallel corpus, Proceedings of 14th Nordic Conference on Computational Linguistics (NoDaLiDa) " (pp. 1?8). Reykjavik, p.126, 2003.

. Iceland, Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle, p.2

J. Tiedemann and L. Nygaard, The OPUS corpus -parallel & free, Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), 2004.

E. Tromp and M. Pechenizkiy, Graph-based N-gram language identification on short texts, Proceedings of the 20th annual Belgian-Dutch Conference on Machine Learning, pp.27-34, 2011.

J. Van-de-walle, H. Depraetere, and J. Pietrzak, Bologna Translation Service: Highquality automated translation of study programmes into English, Proceedings of the EDULEARN 2012 Conference, pp.5831-5835, 2012.

J. Van-de-walle, H. Depraetere, and J. Pietrzak, Bologna Translation Service: Making study programmes accessible throughout Europe by means of high-quality automated translation, Proceedings of ICERI 2012 Conference, pp.3910-3918, 2012.

H. Van-halteren, Source language markers in EUROPARL translations, Proceedings of the 22nd International Conference on Computational Linguistics, COLING '08, pp.937-9441599199, 2008.
DOI : 10.3115/1599081.1599199

B. Vauquois and C. Boitet, Automated translation at Grenoble University, Computational Linguistics, vol.11, issue.1, pp.28-36, 1985.

S. Virpioja, P. Smit, S. Grönroos, and M. Kurimo, Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline (Aalto Univ), pp.978-952, 2013.

J. Vogel and D. Tresner-kirsch, Robust Language Identification in Short, Noisy Texts : Improvements to LIGA, Proceedings of the 3rd International Workshop on Mining Ubiquitous and Social Environments, pp.1-9, 2012.

Y. Vyas, S. Gella, J. Sharma, K. Bali, and M. Choudhury, POS Tagging of English-Hindi Code-Mixed Social Media Content, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.974-979, 2014.
DOI : 10.3115/v1/D14-1105

L. Wang and C. Boitet, Online production of HQ parallel corpora and permanent taskbased evaluation of multiple MT systems: both can be obtained through iMAGs with no added cost, Proceedings of the 2nd Workshop on Post-Editing Technologies and Practice at MT Summit 2013, pp.103-110, 2013.

Z. Wang and M. Iwaihara, Cross-lingual Tweet Recommendation Based on User Interest Using Bilingual LDA Related work Retrieved from http, Proceedings of 7th Forum on Data Engineering and Information Management, 2015.

R. Yan and X. Li, Tweet Recommendation with Graph Co-Ranking, pp.516-525, 2012.

S. Yang and A. L. Kavanaugh, Half-Day Tutorial : Collecting, Analyzing and Visualizing Tweets using Open Source Tools, Proceedings of the 12th Annual International Conference on Digital Government Research, pp.374-375, 2011.

X. Yang, Y. Guo, Y. Liu, and H. Steck, A survey of collaborative filtering based social recommender systems, Computer Communications, vol.41, pp.1-10126, 2014.
DOI : 10.1016/j.comcom.2013.06.009

Z. Ying, Modèles et outils pour des bases lexicales " métier " multilingues et contributives de grande taille, utilisables tant en traduction automatique et automatisée que pour des services dictionnairiques variés. (Doctoral dissertation, Alpes), 2016.

A. Zielinski, U. Bügel, . Hindi, . Marathi, and E. Gujarati, Multilingual Analysis of Twitter News in Support of Mass Emergency Events DVM file contents ** DVM: morphological "variables" (attributes) for an AM phase aiming at handling indian tweets, Proceedings of International Conference on Information Systems for Crisis Response and Management (ISCRAM), pp.5-10, 2012.

, ** For the moment, the language code is HIN (Hindi), but we may change it later to ITW (Indian TWeets)

-. Exclusive,

*. Language, . En-english, . Fr-french, . Hi-hindi, . Mr-marathi et al.,

L. ==, . En, . Fr, . Hi, . Mr et al.,

*. Tactical-variable, the FINAL value indicates that we want to force a unique result for AM, TAKTIK == ( FINAL )

-. , Non-exclusive variables (set values)

, ** Dictionaries of bases (radicals) and affixes

D. ==, , 2006.

*. By, 1: prefixes (all languages considered, but they will contain their language id) 2: radicals (Hindi) 3: suffixes (all languages considered, but they will contain their language id) 4: radicals (English) 5: radicals (Marathi, p.radicals

*. Typography, capitalization) of an occurence (indian tweets often contain English words!)

T. ==-(-allupp, *. All-uppercase, *. Firstup, and . Uppercase, ABBREV, ** ABBREViation ending with a period. SHILEFT ** SHort I goes

2. Dvs-file-contents and *. Dvs, syntactic" (actually syntactic, morphological or semantic!) variables" (attributes) for an AM phase aiming at handling Indian tweets, potentially multilingual --containing Hindi

, ** Creation: 13/09, 2016.

*. Updated and . Ritesh, 16: added categories for Number and Person ** Updated by Ritesh

*. Subcategory, . Nouns, *. Subn-==-(-c, *. P. Common, and . V. Proper,

S. Spatio,

*. Subcategory, . Verbs, *. Subv-==-(-m, and . Main,

A. Auxiliary,

, Appendix 3 116, p.126

*. Subcategory and . Pronouns,

S. ==, *. Pr, . Personal, *. Rf, . Reflexive et al.,

, ** Subcategory of adJuncts of nouns

S. ==-(-adj, *. Q. Adjective, and . Quantifier,

*. Subcategory-of-demonstrative and *. *. Subd-==-(-ab, ABsolute --that. RL, ** ReLative --Dravidian language ??? --seems to be wrong. WH ** WH-demonstrative --Dravidian language ??? --seems to be wrong. )

*. Subcategory,

S. ==, *. Mn, . Manner, *. Lc, . Location et al.,

D. Degree,

S. ==-(-rl, *. , V. , *. N. , and *. *. , , pp.Nominal -building

C. ,

, ** Subcategory of postpositions: none

*. Subcategory,

*. Subcategory and . Punctuations,

S. == and *. *. Sipul, SIngle PUnctuation Left: "itemizer" such as hyphen or dash after a line break like \n or <br> or </h1>, or "enumerator" such as 1-2-4-a) or ii)

S. and *. *. , SIngle PUnctuation Right (period, comma, ellipsis); colon, semi-colon, interrogation sign, exclamation sign)

S. and *. *. , SIngle HTml tags --monotags (ex: <img a-v list />, to be preprocessed into for example %%HTMMONOTAG_img_24

D. and *. *. , DOuble PUnctuation Left (opening quote, parenthesis, bracket, brace, parenthetical dash)

D. and *. *. , DOuble PUnctuation Right (closing quote, parenthesis, bracket, brace, parenthetical dash)

D. and *. *. , DOuble HTml tags Left (<i a-v list>, to be preprocessed into for example %%HTMOPENTAG_i_25

D. and *. *. , DOuble HTml tags Right (</i>, to be preprocessed into for example %%HTMCLOSETAG_i_25

X. ,

*. Subcategory, . Subem, *. Emolist, . Emoji-list, *. Phatic et al.,

, ** Subcategory of tweet-specific occurences, p.126

S. ==, *. Twcd, and . Command,

T. Address,

, Subcategory of Out of Text elements (hors-texte in French

S. ==, ** image or icon, can function as a proper singular noun (for example, %%IMG_15), IMAGE

M. and *. *. , can function as a noun (for example, $ab+2$ preprocessed as %%MEXP_13)

M. and *. *. , can function as noun or verbal kernel (for example, $xy>2$ preprocessed as %%MREL_43)

P. , *. *. , and B. , , p.123

P. , *. *. , and O. Capitan, , pp.10-11

, ** chemical element like H_2O or Al_2O_3, possibly preprocessed as %%CHEMELEM_22, CHEMELEM

C. and *. *. , CHEMical FORMula (like benzines, preprocessed as %%CHEMFORM_23. SYSSTR, ** like menu names, menu items, command lines, system answers

S. , *. Symbol, £. , and %. ,

N. and *. Nominal--building,

U. Unknown,

*. Morphosyntactic,

C. ==-(-n, *. V. Noun, *. P. Verb, and *. Pronoun,

J. and *. Nominal-modifier,

D. , *. A. Demonstrative, and *. L. Adverb,

P. and *. C. Postposition,

P. and *. Punctuation,

E. , *. Emoji, and . Phatic,

O. and *. *. , Out of Text (hors-texte in French

T. and *. Command,

R. Residual,

C. Ma-input and :. ???, ??, Total Output : 2 [ Root : ?"??, Class : , Category : verb, Suffix : Null ] [ Gender : x, Number : x, Person : x, Case : x, Tense : x, Aspect : x

, Category : verb, Suffix : ?? ] [ Gender : +masc, Number : -pl, Person : x, Case : x, Tense : x, Aspect : +perfect, Mood, Set of Roots and Features

:. Token, T. ?&, and . Output, Set of Roots and Features

, Token : ?? #??&, issue.1

, Number : +pl, Person : x, Case : +oblique, Tense : x, Aspect : x, Mood : x ] [ Gender : -masc, Number : +pl, Person : x, Case : +oblique, Tense : x, Aspect : x, Mood : x ] [ Gender : -masc, Number : +pl, Person : x, Case : +oblique, Tense : x, Aspect : x, Mood, Set of Roots and Features

!. Token, T. ???-??-', and . Output, Set of Roots and Features

, Token : ?? ???, issue.1

, Category : verb, Suffix : ? ] [ Gender : +masc, Number : +pl, Person : x, Case : x, Tense : x, Aspect : +perfect, Mood, Morphological analyses output from IIIT public web service 86

:. Input and . ???, Address TOKEN Features (af='root,cat,gen,num,per,case,tam

, ??,adj,m,sg,d,'> <fs af='?"??,n,m,sg,3,d,0,0'> <fs af=

!. ??, $. ??-'<fs-af=-'!, . ??, and . ??, 0'> 5 ?? ??? <fs af='?? ?,v,any,sg,2,?,e' hon='y'> <fs af='?? ?,v,any,pl, p.122126

, Appendix 7 Morphological analyses of code-mixed tweets by ATEF 1

?. ??, ?. ????, and . ???-???-?-??-???-?????-'????-????-'<-/-tweet>, UL('ULTXT') 2 '': UL('ULFRA') 3 '': UL('ULOCC')

, PER(TD) 5 '': UL('ULOCC')

, NUM(SNG,PLR) 7 '': UL('ULOCC')

?. ????-': and . Ul, CAT(RD) 9 '': UL

, CAT(RD) 11 '': UL

?. ???, CAT(RD) 13 '': UL

, (PP) 15

, CAT(RD) 17 '': UL('ULOCC') 18 '?????': UL('?????'), CAT(RD) 19 '': UL('ULOCC') 20 ''????': UL, CAT(RD) 21 '': UL('ULOCC') 22 '????'': UL('????_'), CAT(RD) 23 '': UL('ULOCC') 24 '<tweet LANG (XML), p.UL

R. and ?. ?????, UL('ULFRA') 4338 '': UL('ULOCC') 4339 'RT': UL('RT'), CAT(RD) 4340 '': UL('ULOCC')

, 4341 '@viveklkw:': UL('@viveklkw:'), CAT(RD) 4342

, 4344 UL('ULOCC') 4345 ' ! ': UL, CAT(RD) 4346 '': UL

, 4347, CAT(RD) 4348 '': UL

, NUM(SNG) 4350 '': UL

?. , CAT(RD) 4352 CAT(RD) 4356 '': UL('ULOCC') 4357 '????': UL CAT(RD) 4358 '': UL('ULOCC') 4359 '?? ': UL, CAT(RD) 4354 '': UL('ULOCC') 4355 '???': UL('???') CAT(RD) 4360 '': UL('ULOCC') 4361 '??? ??': UL('??? ??'), CAT(RD) 4362 '': UL('ULOCC') 4363 '????? ': UL('????? '), CAT(RD) 4364 '': UL('ULOCC') 4365 '?? ?? ': UL('?? ?? '), CAT(RD) 4366 '': UL('ULOCC') 4367 '?? ': UL('?? '), CAT(RD) 4368 '': UL('ULOCC') 4369 '..': UL('..'), CAT(RD) 4370 '': UL('ULOCC')

, SUBV(M) 4372 '': UL('ULOCC') 4373 '?? ': UL('?? '), CAT(RD) 4374 '': UL('ULOCC') 4375 '????? ': UL('????? '), CAT(RD) 4376 '': UL('ULOCC')

, Appendix, p.126

, SUBJ(Q) 4378 '': UL('ULOCC')

, SUBV(M) 4380 '': UL('ULOCC') 4381 '??????': UL('??????'), CAT(RD) 4382 '': UL('ULOCC')

, LANG (XML) Appendix 7 125, p.126126