. Entrée, Ensemble de caractéristiques construites à partir des messages enrichis, N : nombre total de caractéristiques, L : nombre de caractéristiques choisies aléatoirement, Seuil : poids minimum des mots considérés, Sortie: Forêt sémantique \* Construction du réseau sémantique *\ réseau = appliquerLDASur(car[N]) POUR chaque arbre : espaceInitial = choisir aléatoirement L caractéristiques < N /* l'élargissement de l'espace de caractéristiques */ espaceCaracteristiquesFinal ? [ ] POUR chaque caractéristique dans espaceInitial

S. Listecaracteristiquesaajouter and <. , espaceCaracteristiquesFinal += ListeCaracteristiquesAAjouter Construire l'arbre en tenant compte seulement d'espaceCaracteristiquesFinal Algorithme 6 : Semantic Feature Selection (SFS) Entrée: caractéristique, réseau, seuil Sortie: ListeDeCaractéristiques[ ] thème =

<. Si-thème, POUR chaque mot dans thème Si Poids(mot) > seuil ListeDeCaractéristiques + = mot retourner ( ListeDeCaractéristiques) SINON retourner

H. Abdulsalam, B. David, P. Skillicorn, and . Martin, Streaming Random Forests, 11th International Database Engineering and Applications Symposium (IDEAS 2007), pp.225232-57, 2007.
DOI : 10.1109/IDEAS.2007.4318108

R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, An interval classi er for database mining applications, Proc. of the VLDB Conference, p.560573, 1992.

Y. Amit and D. Geman, Shape Quantization and Recognition with Randomized Trees, Neural Computation, vol.1, issue.1, p.15451588, 1997.
DOI : 10.1016/0031-3203(90)90098-6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

I. Androutsopoulos, J. Koutsias, V. Konstantinos, G. Chandrinos, . Paliouras et al., An evaluation of naive bayesian antispam ltering. arXiv preprint cs/0006013, 2000.

L. Miclet and A. Cornuéjols, Apprentissage articiel concepts et algorithmes, EYROLLES, 2002.

R. Arun, V. Suresh, . Ce-veni-madhavan, and . Murthy, On nding the natural number of topics with latent dirichlet allocation : Some observations, Advances in Knowledge Discovery and Data Mining, pp.391402-2010

L. Adam, V. J. Berger, D. Pietra, and S. Pietra, A maximum entropy approach to natural language processing, Computational linguistics, vol.22, issue.35, pp.3971-3978, 1996.

S. Bernard, Forêts Aléatoires : De l'Analyse des Mécanismes de Fonctionnement à la Construction Dynamique, 2009.

S. Bernard, S. Adam, and L. Heutte, Dynamic Random Forests, Pattern Recognition Letters, vol.33, issue.12, pp.15801586-2012
DOI : 10.1016/j.patrec.2012.04.003

URL : https://hal.archives-ouvertes.fr/hal-00710083

S. Bernard, L. Heutte, and S. Adam, Forest-RK: A New Random Forest Induction Method, International Conference on Intelligent Computing, p.430437, 2008.
DOI : 10.1007/978-3-540-85984-0_52

URL : https://hal.archives-ouvertes.fr/hal-00436367

G. Biau, L. Devroye, and G. Lugosi, Consistency of random forests and other averaging classiers, The Journal of Machine Learning Research, vol.9, 2008.

A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, Moa : Massive online analysis, The Journal of Machine Learning Research, vol.11, 2010.
DOI : 10.1007/978-3-642-41398-8_9

L. Breiman, Bagging predictors, Machine Learning, vol.10, issue.2, pp.123140-123185, 1996.
DOI : 10.2307/1403680

L. Breiman, Random forests, Machine learning, vol.45, issue.64, pp.532-577, 2001.

L. Breiman, J. Friedman, J. Charles, . Stone, A. Richard et al., Classication and regression trees, pp.45-48, 1984.

W. Buntine, Learning classification trees, Statistics and Computing, vol.45, issue.2, pp.63-73, 1992.
DOI : 10.1007/978-1-4757-4286-2

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

C. Caragea, N. Mcneese, A. Jaiswal, G. Traylor, H. W. Kim et al., Classifying text messages for the haiti earthquake, Information Systems for Crisis Response and Management, ISCRAM, pp.27-31, 2011.

J. Catlett, Megainduction : machine learning on very large databases, 1991.

M. Chen, X. Jin, and D. Shen, Short text classication improved by learning multi-granularity topics, IJCAICité en pages vii, pp.17761781-17761816, 2011.

Z. Chen and W. Zhang, Integrative Analysis Using Module-Guided Random Forests Reveals Correlated Genetic Factors Related to Mouse Weight, PLoS Computational Biology, vol.25, issue.1, pp.1002956-2013
DOI : 10.1371/journal.pcbi.1002956.s020

URL : http://doi.org/10.1371/journal.pcbi.1002956

S. Ciss, Forêts uniformément aléatoires et détection des irrégularités aux cotisations sociales, pp.2014-53

W. William and . Cohen, Integration of heterogeneous databases without common domains using queries based on textual similarity, In ACM SIGMOD Record, vol.27, p.201212, 1998.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273297-273340, 1995.
DOI : 10.1007/BF00994018

S. Deerwester, T. Susan, . Dumais, W. George, . Furnas et al., Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol.41, issue.6, pp.391-401, 1990.
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

P. Arthur, . Dempster, M. Nan, . Laird, B. Donald et al., Maximum likelihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B, p.138, 1977.

P. Domingos and G. Hulten, Mining high-speed data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00
DOI : 10.1145/347090.347107

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. A. Fails, R. Dan, and . Olsen-jr, Interactive machine learning, Proceedings of the 8th international conference on Intelligent user interfaces, IUI '03, p.3945, 2003.
DOI : 10.1145/604045.604056

J. Gama, R. Sebastião, and P. P. Rodrigues, Issues in evaluation of stream learning algorithms, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.329-338, 2009.
DOI : 10.1145/1557019.1557060

A. Edmund and . Gehan, A generalized wilcoxon test for comparing arbitrarily singlycensored samples, Biometrika, vol.52, issue.12, p.203223, 1965.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine Learning, vol.63, issue.1, p.342, 2006.
DOI : 10.1007/s10994-006-6226-1

URL : https://hal.archives-ouvertes.fr/hal-00341932

H. Grabner and H. Bischof, On-line boosting and vision. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol.1, p.260267, 2006.
DOI : 10.1109/cvpr.2006.215

W. Hoeding, Probability inequalities for sums of bounded random variables, Journal of the American statistical association, vol.58, issue.301, p.1330, 1963.

T. Hofmann, Probabilistic latent semantic analysis, Proceedings of the Fifteenth conference on Uncertainty in articial intelligence, p.289296

T. Hofmann, J. Puzicha, I. Michael, and . Jordan, Learning from dyadic data Advances in neural information processing systems, p.466472, 1999.

A. Hotho, A. Maedche, and S. Staab, Text clustering based on good aggregations, Proceedings 2001 IEEE International Conference on Data Mining, p.607608, 2001.
DOI : 10.1109/ICDM.2001.989577

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

A. Hotho, S. Staab, and G. Stumme, Ontologies improve text document clustering, Third IEEE International Conference on Data Mining, 2003.
DOI : 10.1109/ICDM.2003.1250972

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

X. Hu, N. Sun, C. Zhang, and T. Chua, Exploiting internal and external semantics for the clustering of short texts using world knowledge, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.919928-919946, 2009.
DOI : 10.1145/1645953.1646071

X. Hu, X. Zhang, C. Lu, K. Eun, X. Park et al., Exploiting Wikipedia as external knowledge for document clustering, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.389396-389412, 2009.
DOI : 10.1145/1557019.1557066

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. Hurley and S. Rickard, Comparing measures of sparsity Information Theory, IEEE Transactions on, vol.55, issue.113, pp.47234741-114, 2009.

T. Kenter and . Maarten-de-rijke, Short Text Similarity with Word Embeddings, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM '15, pp.14111420-2015
DOI : 10.1561/1500000019

K. Kira, A. Larry, and . Rendell, The feature selection problem : Traditional methods and a new algorithm, AAAI, pp.129134-129165, 1992.

M. Klassen and N. Paturi, Web document classication by keywords using random forests, International Conference on Networked Digital Technologies, p.256261, 2010.
DOI : 10.1007/978-3-642-14306-9_26

R. Kohavi and C. Kunz, Option decision trees with majority votes, ICML, p.161169, 1997.

I. Kononenko, Estimating attributes: Analysis and extensions of RELIEF, Machine Learning : ECML-94, p.171182, 1994.
DOI : 10.1007/3-540-57868-4_57

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Kullback, A. Richard, and . Leibler, On information and suciency. The annals of mathematical statistics, p.7986, 1951.

. Balaji-lakshminarayanan, M. Daniel, Y. W. Roy, and . Teh, Mondrian forests : Ecient online random forests, Advances in neural information processing systems, pp.31403148-2014

H. Liu and H. Motoda, Feature extraction, construction and selection : A data mining perspective, 1998.
DOI : 10.1007/978-1-4615-5725-8

O. Maron, W. Andrew, and . Moore, Hoeding races : Accelerating model selection search for classication and function approximation Advances in neural information processing systems, p.5959, 1994.

V. Metsis, I. Androutsopoulos, and G. Paliouras, Spam ltering with naive bayes-which naive bayes ? In CEAS, p.2869, 2006.

A. George and . Miller, Wordnet : a lexical database for english, Communications of the ACM, vol.38, issue.11, p.3941, 1995.

N. Carl and . Morris, Parametric empirical bayes inference : theory and applications, Journal of the American Statistical Association, vol.78, issue.381, p.4755, 1983.

C. Nikunj and . Oza, Online bagging and boosting, Systems, man and cybernetics, p.23402345, 2005.

J. Pakkanen, J. Iivarinen, and E. Oja, The evolving tree ?a novel selforganizing network for data analysis, Neural Processing Letters, 2004.

H. Christos, H. Papadimitriou, P. Tamaki, S. Raghavan, and . Vempala, Latent semantic indexing : A probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159168, 1998.

B. Pfahringer, G. Holmes, and R. Kirkby, New options for hoeding trees, AI 2007 : Advances in Articial Intelligence, p.9099
DOI : 10.1007/978-3-540-76928-6_11

X. Phan, L. Nguyen, and S. Horiguchi, Learning to classify short and sparse text & web with hidden topics from large-scale data collections, Proceeding of the 17th international conference on World Wide Web , WWW '08, pp.91100-91135, 2008.
DOI : 10.1145/1367497.1367510

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Soa, P. João, and P. Martins, Ontologies : how can they be built ? Knowledge and information systems, p.441464, 2004.

P. Rafeeque and S. Sendhilkumar, A survey on Short text analysis in Web, 2011 Third International Conference on Advanced Computing, p.365371, 2011.
DOI : 10.1109/ICoAC.2011.6165203

J. Ramos, Using tf-idf to determine word relevance in document queries, Proceedings of the rst instructional conference on machine learning, 2003.

M. Daniel, Y. W. Roy, and . Teh, The mondrian process, Advances in neural information processing systems, p.13771384, 2009.

A. Saari, C. Leistner, J. Santner, M. Godec, and H. Bischof, On-line random forests, Computer Vision Workshops (ICCV Workshops ) IEEE 12th International Conference on, p.13931400, 2009.

N. Tara, S. Sainath, D. Maskey, B. Kanevsky, D. Ramabhadran et al., Sparse representations for text categorization, INTERSPEECH, p.22662269, 2010.

S. Sarawagi and A. Bhamidipaty, Interactive deduplication using active learning, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, p.269278, 2002.
DOI : 10.1145/775047.775087

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

K. Schneider, Techniques for improving the performance of naive bayes for text classication, Computational Linguistics and Intelligent Text Processing, pp.682693-682730, 2005.

C. Silva and B. Ribeiro, On Text-based Mining with Active Learning and Background Knowledge Using SVM, Soft Computing, vol.2, issue.4, p.519530, 2007.
DOI : 10.1007/s00500-006-0080-8

URL : http://estudogeral.sib.uc.pt/jspui/bitstream/10316/7640/1/obra.pdf

C. Silva and B. Ribeiro, Improving text classication performance with incremental background knowledge, International Conference on Articial Neural Networks, p.923931, 2009.
DOI : 10.1007/978-3-642-04274-4_95

A. Silvescu, C. Caragea, and V. Honavar, Combining superstructuring and abstraction on sequence classication, Data Mining ICDM'09. Ninth IEEE International Conference onCité en pages vii, pp.986991-987017, 2009.
DOI : 10.1109/icdm.2009.130

URL : http://citeseerx.ist.psu.edu/viewdoc/download?doi=

Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen, Short text conceptualization using a probabilistic knowledgebase, Proceedings of the Twenty-Second international joint conference on Articial Intelligence-Volume Volume Three, pp.23302336-2011

B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas, Short text classication in twitter to improve information ltering, Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp.841842-841849, 2010.
DOI : 10.1145/1835449.1835643

URL : https://etd.ohiolink.edu/!etd.send_file?accession=osu1275406094&disposition=inline

W. Street and Y. Kim, A streaming ensemble algorithm (sea) for large-scale classication, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, p.377382
DOI : 10.1145/502512.502568

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

A. Sun, Short text classication using very few words, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp.11451146-2012
DOI : 10.1145/2348283.2348511

J. Tang, X. Wang, H. Gao, X. Hu, and H. Liu, Enriching short text representation in microblog for clustering, Frontiers of Computer Science, vol.688101, issue.7 8, pp.22-23, 2012.

M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas, Sentiment strength detection in short informal text, Journal of the American Society for Information Science and Technology, vol.5, issue.2, p.6125442558, 2010.
DOI : 10.1002/asi.21180

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Tong and D. Koller, Support vector machine active learning with applications to text classication, Journal of machine learning research, vol.2, issue.87, p.4566, 2001.

Y. Bing-kun-wang, W. Huang, X. Yang, and . Li, Short text classication based on strong feature thesaurus, Journal of Zhejiang University SCIENCE C, vol.13, issue.9, pp.649659-2012

F. Wang, Z. Wang, Z. Li, and J. Wen, Concept-based short text classication and ranking, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp.10691078-2014
DOI : 10.1145/2661829.2662067

X. Wang, R. Chen, Y. Jia, and B. Zhou, Short text classication using wikipedia concept based document representation, Information Technology and Applications (ITA), 2013 International Conference on, pp.471-474, 2013.
DOI : 10.1109/ita.2013.114

B. James-winer, R. Donald, . Brown, M. Kenneth, and . Michels, Statistical principles in experimental design, 1971.

W. Wu, H. Li, H. Wang, Q. Kenny, and . Zhu, Probase, Proceedings of the 2012 international conference on Management of Data, SIGMOD '12, p.481492
DOI : 10.1145/2213836.2213891

L. Yang, C. Li, Q. Ding, and L. Li, Combining lexical and semantic features for short text classication, Procedia Computer Science, vol.22, issue.27, pp.7886-2013
DOI : 10.1016/j.procs.2013.09.083

URL : http://doi.org/10.1016/j.procs.2013.09.083

Y. Yang and J. O. Pedersen, A comparative study on feature selection in text categorization, ICML, p.412420, 1997.

I. Yoo, X. Hu, and I. Song, Integration of semantic-based bipartite graph representation and mutual renement strategy for biomedical literature clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, p.791796, 2006.

S. Zelikovitz and H. Hirsh, Transductive lsi for short text classication problems, FLAIRS conference, pp.556561-556572, 2004.

S. Zelikovitz and H. Hirsh, Improving short text classication using unlabeled background knowledge to assess document similarity, Proceedings of the seventeenth international conference on machine learning, p.11831190, 2000.

X. Zhang, L. Jing, X. Hu, M. Ng, and X. Zhou, A comparative study of ontology based term similarity measures on pubmed