Ensemble de caractéristiques construites à partir des messages enrichis, N : nombre total de caractéristiques, L : nombre de caractéristiques choisies aléatoirement, Seuil : poids minimum des mots considérés, Sortie: Forêt sémantique \* Construction du réseau sémantique *\ réseau = appliquerLDASur(car[N]) POUR chaque arbre : espaceInitial = choisir aléatoirement L caractéristiques < N /* l'élargissement de l'espace de caractéristiques */ espaceCaracteristiquesFinal ? [ ] POUR chaque caractéristique dans espaceInitial ,
espaceCaracteristiquesFinal += ListeCaracteristiquesAAjouter Construire l'arbre en tenant compte seulement d'espaceCaracteristiquesFinal Algorithme 6 : Semantic Feature Selection (SFS) Entrée: caractéristique, réseau, seuil Sortie: ListeDeCaractéristiques[ ] thème = ,
POUR chaque mot dans thème Si Poids(mot) > seuil ListeDeCaractéristiques + = mot retourner ( ListeDeCaractéristiques) SINON retourner ,
Streaming Random Forests, 11th International Database Engineering and Applications Symposium (IDEAS 2007), pp.225232-57, 2007. ,
DOI : 10.1109/IDEAS.2007.4318108
An interval classi er for database mining applications, Proc. of the VLDB Conference, p.560573, 1992. ,
Shape Quantization and Recognition with Randomized Trees, Neural Computation, vol.1, issue.1, p.15451588, 1997. ,
DOI : 10.1016/0031-3203(90)90098-6
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.5478
An evaluation of naive bayesian antispam ltering. arXiv preprint cs/0006013, 2000. ,
Apprentissage articiel concepts et algorithmes, EYROLLES, 2002. ,
On nding the natural number of topics with latent dirichlet allocation : Some observations, Advances in Knowledge Discovery and Data Mining, pp.391402-2010 ,
A maximum entropy approach to natural language processing, Computational linguistics, vol.22, issue.35, pp.3971-3978, 1996. ,
Forêts Aléatoires : De l'Analyse des Mécanismes de Fonctionnement à la Construction Dynamique, 2009. ,
Dynamic Random Forests, Pattern Recognition Letters, vol.33, issue.12, pp.15801586-2012 ,
DOI : 10.1016/j.patrec.2012.04.003
URL : https://hal.archives-ouvertes.fr/hal-00710083
Forest-RK: A New Random Forest Induction Method, International Conference on Intelligent Computing, p.430437, 2008. ,
DOI : 10.1007/978-3-540-85984-0_52
URL : https://hal.archives-ouvertes.fr/hal-00436367
Consistency of random forests and other averaging classiers, The Journal of Machine Learning Research, vol.9, 2008. ,
Moa : Massive online analysis, The Journal of Machine Learning Research, vol.11, 2010. ,
DOI : 10.1007/978-3-642-41398-8_9
Bagging predictors, Machine Learning, vol.10, issue.2, pp.123140-123185, 1996. ,
DOI : 10.2307/1403680
Random forests, Machine learning, vol.45, issue.64, pp.532-577, 2001. ,
Classication and regression trees, pp.45-48, 1984. ,
Learning classification trees, Statistics and Computing, vol.45, issue.2, pp.63-73, 1992. ,
DOI : 10.1007/978-1-4757-4286-2
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.267
Classifying text messages for the haiti earthquake, Information Systems for Crisis Response and Management, ISCRAM, pp.27-31, 2011. ,
Megainduction : machine learning on very large databases, 1991. ,
Short text classication improved by learning multi-granularity topics, IJCAICité en pages vii, pp.17761781-17761816, 2011. ,
Integrative Analysis Using Module-Guided Random Forests Reveals Correlated Genetic Factors Related to Mouse Weight, PLoS Computational Biology, vol.25, issue.1, pp.1002956-2013 ,
DOI : 10.1371/journal.pcbi.1002956.s020
URL : http://doi.org/10.1371/journal.pcbi.1002956
Forêts uniformément aléatoires et détection des irrégularités aux cotisations sociales, pp.2014-53 ,
Integration of heterogeneous databases without common domains using queries based on textual similarity, In ACM SIGMOD Record, vol.27, p.201212, 1998. ,
Support-vector networks, Machine Learning, vol.1, issue.3, pp.273297-273340, 1995. ,
DOI : 10.1007/BF00994018
Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol.41, issue.6, pp.391-401, 1990. ,
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8490
Maximum likelihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B, p.138, 1977. ,
Mining high-speed data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00 ,
DOI : 10.1145/347090.347107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.3124
Interactive machine learning, Proceedings of the 8th international conference on Intelligent user interfaces, IUI '03, p.3945, 2003. ,
DOI : 10.1145/604045.604056
Issues in evaluation of stream learning algorithms, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.329-338, 2009. ,
DOI : 10.1145/1557019.1557060
A generalized wilcoxon test for comparing arbitrarily singlycensored samples, Biometrika, vol.52, issue.12, p.203223, 1965. ,
Extremely randomized trees, Machine Learning, vol.63, issue.1, p.342, 2006. ,
DOI : 10.1007/s10994-006-6226-1
URL : https://hal.archives-ouvertes.fr/hal-00341932
On-line boosting and vision. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol.1, p.260267, 2006. ,
DOI : 10.1109/cvpr.2006.215
Probability inequalities for sums of bounded random variables, Journal of the American statistical association, vol.58, issue.301, p.1330, 1963. ,
Probabilistic latent semantic analysis, Proceedings of the Fifteenth conference on Uncertainty in articial intelligence, p.289296 ,
Learning from dyadic data Advances in neural information processing systems, p.466472, 1999. ,
Text clustering based on good aggregations, Proceedings 2001 IEEE International Conference on Data Mining, p.607608, 2001. ,
DOI : 10.1109/ICDM.2001.989577
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.9314
Ontologies improve text document clustering, Third IEEE International Conference on Data Mining, 2003. ,
DOI : 10.1109/ICDM.2003.1250972
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.321
Exploiting internal and external semantics for the clustering of short texts using world knowledge, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, pp.919928-919946, 2009. ,
DOI : 10.1145/1645953.1646071
Exploiting Wikipedia as external knowledge for document clustering, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.389396-389412, 2009. ,
DOI : 10.1145/1557019.1557066
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.2327
Comparing measures of sparsity Information Theory, IEEE Transactions on, vol.55, issue.113, pp.47234741-114, 2009. ,
Short Text Similarity with Word Embeddings, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM '15, pp.14111420-2015 ,
DOI : 10.1561/1500000019
The feature selection problem : Traditional methods and a new algorithm, AAAI, pp.129134-129165, 1992. ,
Web document classication by keywords using random forests, International Conference on Networked Digital Technologies, p.256261, 2010. ,
DOI : 10.1007/978-3-642-14306-9_26
Option decision trees with majority votes, ICML, p.161169, 1997. ,
Estimating attributes: Analysis and extensions of RELIEF, Machine Learning : ECML-94, p.171182, 1994. ,
DOI : 10.1007/3-540-57868-4_57
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.6297
On information and suciency. The annals of mathematical statistics, p.7986, 1951. ,
Mondrian forests : Ecient online random forests, Advances in neural information processing systems, pp.31403148-2014 ,
Feature extraction, construction and selection : A data mining perspective, 1998. ,
DOI : 10.1007/978-1-4615-5725-8
Hoeding races : Accelerating model selection search for classication and function approximation Advances in neural information processing systems, p.5959, 1994. ,
Spam ltering with naive bayes-which naive bayes ? In CEAS, p.2869, 2006. ,
Wordnet : a lexical database for english, Communications of the ACM, vol.38, issue.11, p.3941, 1995. ,
Parametric empirical bayes inference : theory and applications, Journal of the American Statistical Association, vol.78, issue.381, p.4755, 1983. ,
Online bagging and boosting, Systems, man and cybernetics, p.23402345, 2005. ,
The evolving tree ?a novel selforganizing network for data analysis, Neural Processing Letters, 2004. ,
Latent semantic indexing : A probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159168, 1998. ,
New options for hoeding trees, AI 2007 : Advances in Articial Intelligence, p.9099 ,
DOI : 10.1007/978-3-540-76928-6_11
Learning to classify short and sparse text & web with hidden topics from large-scale data collections, Proceeding of the 17th international conference on World Wide Web , WWW '08, pp.91100-91135, 2008. ,
DOI : 10.1145/1367497.1367510
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.332.6000
Ontologies : how can they be built ? Knowledge and information systems, p.441464, 2004. ,
A survey on Short text analysis in Web, 2011 Third International Conference on Advanced Computing, p.365371, 2011. ,
DOI : 10.1109/ICoAC.2011.6165203
Using tf-idf to determine word relevance in document queries, Proceedings of the rst instructional conference on machine learning, 2003. ,
The mondrian process, Advances in neural information processing systems, p.13771384, 2009. ,
On-line random forests, Computer Vision Workshops (ICCV Workshops ) IEEE 12th International Conference on, p.13931400, 2009. ,
Sparse representations for text categorization, INTERSPEECH, p.22662269, 2010. ,
Interactive deduplication using active learning, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, p.269278, 2002. ,
DOI : 10.1145/775047.775087
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.5288
Techniques for improving the performance of naive bayes for text classication, Computational Linguistics and Intelligent Text Processing, pp.682693-682730, 2005. ,
On Text-based Mining with Active Learning and Background Knowledge Using SVM, Soft Computing, vol.2, issue.4, p.519530, 2007. ,
DOI : 10.1007/s00500-006-0080-8
URL : http://estudogeral.sib.uc.pt/jspui/bitstream/10316/7640/1/obra.pdf
Improving text classication performance with incremental background knowledge, International Conference on Articial Neural Networks, p.923931, 2009. ,
DOI : 10.1007/978-3-642-04274-4_95
Combining superstructuring and abstraction on sequence classication, Data Mining ICDM'09. Ninth IEEE International Conference onCité en pages vii, pp.986991-987017, 2009. ,
DOI : 10.1109/icdm.2009.130
URL : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.210.8894&rep=rep1&type=pdf
Short text conceptualization using a probabilistic knowledgebase, Proceedings of the Twenty-Second international joint conference on Articial Intelligence-Volume Volume Three, pp.23302336-2011 ,
Short text classication in twitter to improve information ltering, Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp.841842-841849, 2010. ,
DOI : 10.1145/1835449.1835643
URL : https://etd.ohiolink.edu/!etd.send_file?accession=osu1275406094&disposition=inline
A streaming ensemble algorithm (sea) for large-scale classication, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, p.377382 ,
DOI : 10.1145/502512.502568
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.9697
Short text classication using very few words, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp.11451146-2012 ,
DOI : 10.1145/2348283.2348511
Enriching short text representation in microblog for clustering, Frontiers of Computer Science, vol.688101, issue.7 8, pp.22-23, 2012. ,
Sentiment strength detection in short informal text, Journal of the American Society for Information Science and Technology, vol.5, issue.2, p.6125442558, 2010. ,
DOI : 10.1002/asi.21180
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.278.3863
Support vector machine active learning with applications to text classication, Journal of machine learning research, vol.2, issue.87, p.4566, 2001. ,
Short text classication based on strong feature thesaurus, Journal of Zhejiang University SCIENCE C, vol.13, issue.9, pp.649659-2012 ,
Concept-based short text classication and ranking, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp.10691078-2014 ,
DOI : 10.1145/2661829.2662067
Short text classication using wikipedia concept based document representation, Information Technology and Applications (ITA), 2013 International Conference on, pp.471-474, 2013. ,
DOI : 10.1109/ita.2013.114
Statistical principles in experimental design, 1971. ,
Probase, Proceedings of the 2012 international conference on Management of Data, SIGMOD '12, p.481492 ,
DOI : 10.1145/2213836.2213891
Combining lexical and semantic features for short text classication, Procedia Computer Science, vol.22, issue.27, pp.7886-2013 ,
DOI : 10.1016/j.procs.2013.09.083
URL : http://doi.org/10.1016/j.procs.2013.09.083
A comparative study on feature selection in text categorization, ICML, p.412420, 1997. ,
Integration of semantic-based bipartite graph representation and mutual renement strategy for biomedical literature clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, p.791796, 2006. ,
Transductive lsi for short text classication problems, FLAIRS conference, pp.556561-556572, 2004. ,
Improving short text classication using unlabeled background knowledge to assess document similarity, Proceedings of the seventeenth international conference on machine learning, p.11831190, 2000. ,
A comparative study of ontology based term similarity measures on pubmed ,