, Chapitre 7. IRIES : un système interactif d'apprentissage de règles d'extraction d'information

, Chaque ensemble de règles d'annotation d'un élément de l'ontologie est construit de manière itérative et interactive, jusqu'à aboutir à un ensemble couvrant de manière satisfaisante les portions de texte qui doivent être mises en correspondance avec l'élément de l'ontologie traité (paquetage Apprentissage interactif de règles sur la figure 9.1). Un utilisateur intervient à chaque itération en travaillant de manière duale sur les règles d'annotation (intension) ainsi que

, Seulement, dans une ontologie les di érents éléments sont reliés entre eux. Cette hiérarchie doit donc être exploitée au mieux afin d'établir un certain ordre et une certaine structure entre les di érentes règles d'extraction d'information. Certaines questions doivent par conséquent être traitées : -quelle relation doit exister entre une règle qui annote un concept et une autre qui annote son fils ? -quelle relation doit exister entre une règle qui annote un concept et une autre qui annote une instance de ce concept ? -quelle relation doit exister entre une règle qui annote une relation entre deux concepts, La généricité de notre approche fait qu'elle peut annoter n'importe quelle cible (un concept, une entité nommée, une relation

, En e et, on parle de conflit quand deux segments de texte qui se chevauchent sont couverts par deux règles di érentes. Plusieurs stratégies de résolution de ce genre de conflits existent dans la littérature (voir section 2.6.2). Nous préco-nisons la considération des règles comme une collection non ordonnée où les règles s'appliquent indépendamment les unes des autres car cette méthode d'organisation o re à l'utilisateur plus de flexibilité dans la définition des règles sans souci des éven-tuels recouvrements avec les règles existantes, Notre approche peut donc être évaluée dans sa globalité dans le cadre de l'annotation sémantique au regard d'ontologies, 2002.

. Reiss, -fusionner les segments de texte qui se chevauchent si leurs règles correspondantes paratagent la même action ou utiliser des politiques plus sophistiquées comme celle définie dans, 2008.

. Dans-un-système and . Interactif, nous pouvons tout à fait développer un module qui identifie les règles conflictuelles et les présente à l'utilisateur laissant à ce dernier le choix de l'action à mener pour résoudre le conflit

S. Aitken, Learning Information Extraction Rules : An Inductive Logic Programming approach, Proceedings of ECAI 2002 (European Conference on Artificial Intelligence, pp.355-359, 2002.

A. Akbik, O. Konomi, and M. Melnikov, Propminer : A Workflow for Interactive Information Extraction and Exploration using Dependency Trees, ACL (Conference System Demonstrations), pp.157-162, 2013.

D. E. Appelt, Introduction to Information Extraction, AI Commun, vol.12, pp.161-172, 1999.

E. Aramaki, T. Imai, K. Miyo, and K. Ohe, Automatic deidentification by using sentence features and label consistency, i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, pp.10-11, 2006.

M. Atzmueller, P. Kluegl, and F. Puppe, Rule-Based Information Extraction for Structured Data Acquisition using TextMarker, LWA-2008 (Special Track on Knowledge Discovery and Machine Learning), pp.1-7, 2008.

L. Audibert, Etude des critères de désambiguïsation sémantique automatique : résultats sur les cooccurrences, Actes de taln'2013, pp.35-44, 2003.

M. Bank and M. Schierle, A Survey of Text Mining Architectures and the UIMA Standard, LREC, pp.3479-3486, 2012.

M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni, Open Information Extraction from the Web, Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp.2670-2676, 2007.

M. Banko and O. Etzioni, The Tradeo s Between Open and Traditional Relation Extraction, ACL-HLT, pp.28-36, 2008.

S. Bannour, L. Audibert, and A. Nazarenko, Mesures de similarité distributionnelle entre termes, 22es journées francophones d'ingénierie des connaissances (IC2011), pp.523-538, 2011.

S. Bannour, L. Audibert, and H. Soldano, Ontology-based semantic annotation : an automatic hybrid rule-based method, Proceedings of BioNLP Shared Task 2013 Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01074936

R. Basili, M. T. Pazienza, and M. Vindigni, Corpus-driven learning of Event Recognition Rules, Proceedings ECAI Workshop on Machine Learning for Information Extraction, 2000.

L. E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, Annals of Mathematical Statistics, vol.37, pp.1554-1563, 1966.

T. Berners-lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American, vol.284, pp.34-43, 2001.

. Bibliographie,

D. M. Bikel, R. Schwartz, and R. M. Weischedel, An algorithm that learns what&lsquo ;s in a name, Mach. Learn, vol.34, issue.1-3, pp.211-231, 1999.

M. W. Bilotti, B. Katz, and J. Lin, What works better for question answering : Stemming or morphological query expansion, Proceedings of the information retrieval for question answering (ir4qa) workshop at sigir, pp.1-3, 2004.

K. Bontcheva, V. Tablan, D. Maynard, and H. Cunningham, Evolving GATE to Meet New Challenges in Language Engineering, Nat. Lang. Eng, vol.10, pp.349-373, 2004.

R. Bossy, W. Golik, Z. Ratkovic, P. Bessières, and C. Nédellec, BioNLP Shared Task 2013 -An overview of the Bacteria Biotope Task, Proceedings of BioNLP Shared Task, 2013.

F. Brauer, R. Rieger, A. Mocan, and W. M. Barczynski, Enabling Information Extraction by Inference of Regular Expressions from Sample Entities, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp.1285-1294, 2011.

S. Brin, Extracting Patterns and Relations from the World Wide Web, Selected papers from the International Workshop on The World Wide Web and Databases, pp.172-183, 1999.

E. Brunet, Le lemme comme on lâaime
URL : https://hal.archives-ouvertes.fr/hal-01790696

P. Buitelaar, P. Cimiano, P. Haase, and M. Sintek, Towards Linguistically Grounded Ontologies, Proceedings of the 6th European Semantic Web Conference on The Semantic Web : Research and Applications, pp.111-125, 2009.

M. E. Cali and R. J. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction, Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp.6-11, 1998.

M. E. Cali and R. J. Mooney, Bottom-up relational learning of pattern matching rules for information extraction, J. Mach. Learn. Res, vol.4, pp.177-210, 2003.

C. Cardie and D. Pierce, Proposal for an Interactive Environment for Information Extraction, 1998.

R. Caruana, P. Hodor, and J. Rosenberg, High precision information extraction, KDD-2000 Workshop on Text Mining, 2000.

J. Y. Chai, A. W. Biermann, and C. I. Guinn, Two dimensional generalization in information extraction, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference, pp.431-438, 1999.

H. L. Chieu, Named entity recognition : a maximum entropy approach using global information, Proceedings of COLING02, pp.190-196, 2002.

H. L. Chieu and H. T. Ng, Named entity recognition : A maximum entropy approach using global information, Coling, 2002.

H. L. Chieu and H. T. Ng, Named Entity Recognition with a Maximum Entropy Approach, Proceedings of CoNLL-2003, pp.160-163, 2003.

L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. R. Reiss et al., SystemT : An Algebraic Approach to Declarative Information Extraction, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.128-137, 2010.

L. Chiticariu, Y. Li, S. Raghavan, and F. Reiss, Enterprise information extraction : recent developments and open challenges, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.1257-1258, 2010.

L. Chiticariu, Y. Li, and F. R. Reiss, Rule-Based Information Extraction is Dead ! Long Live Rule-Based Information Extraction Systems ! In EMNLP, pp.827-832, 2013.

Y. Choueka and S. Lusignan, Disambiguation by short contexts, Computers and the Humanities, vol.19, issue.3, pp.147-157, 1985.

F. Ciravegna, LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts, Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001.

F. Ciravegna, LP)2 : Rule Induction for Information Extraction Using Linguistic Constraints, 2003.

F. Ciravegna, A. Dingli, Y. Wilks, and D. Petrelli, Amilcare : adaptive information extraction for document annotation, SIGIR, pp.367-368, 2002.

W. S. Cleveland, Robust Locally Weighted Regression and Smoothing Scatterplots, Journal of the American Statistical Association, vol.74, pp.829-836, 1979.

D. Cohn, L. Atlas, and R. Ladner, Improving Generalization with Active Learning, Mach. Learn, vol.15, pp.201-221, 1994.

N. Collier, C. Nobata, and J. I. Tsujii, Extracting the names of genes and gene products with a hidden Markov model, Proceedings of the 18th conference on computational linguistics -volume, vol.1, pp.201-207, 2000.

A. Culotta, T. T. Kristjansson, A. Mccallum, and P. A. Viola, Corrective feedback and persistent learning for information extraction, Artif. Intell, vol.170, pp.1101-1122, 2006.

A. Culotta and A. Mccallum, Confidence Estimation for Information Extraction, Proceedings of HLT-NAACL 2004 : Short Papers, pp.109-112, 2004.

H. Cunningham, GATE, a General Architecture for Text Engineering, Computers and the Humanities, vol.36, pp.223-254, 2002.

H. Cunningham, Information Extraction, Automatic. Encyclopedia of Language and Linguistics, 2005.

H. Cunningham, D. Maynard, and V. Tablan, JAPE : a Java Annotation Patterns Engine, She eld, 2000.

I. Dagan and S. P. Engelson, Committee-Based Sampling For Training Probabilistic Classifiers, Proceedings of the Twelfth International Conference on Machine Learning, pp.150-157, 1995.

G. Dejong, An Overview of the FRUMP System, Strategies for Natural Language Processing, pp.149-176, 1982.

L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost et al.,

J. Sachs, Swoogle : A Search and Metadata Engine for the Semantic Web, Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp.652-659, 2004.

G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel et al., The Automatic Content Extraction (ACE) Program Bibliographie -Tasks, Data, and Evaluation, LREC, 2004.

J. Du, Z. Zhang, J. Yan, Y. Cui, and Z. Chen, Using Search Session Context for Named Entity Recognition in Query, Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.765-766, 2010.

B. Eckstein, P. Kluegl, and F. Puppe, Towards Learning Error-Driven Transformations for Information Extraction, Proceedings of the LWA 2011 -Learning, Knowledge, Adaptation, 2011.

K. Eichler, H. Hemsen, M. Löckelt, G. Neumann, and N. Reithinger, Interactive Dynamic Information Extraction, pp.54-61, 2008.

A. Ekbal and S. Bandyopadhyay, Improving the Performance of a NER System by Post-processing and Voting, Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pp.831-841, 2008.

D. W. Embley, Toward Semantic Understanding : An Approach Based on Information Extraction Ontologies, Proceedings of the 15th Australasian Database Conference, pp.3-12, 2004.

M. Es-salihe and S. Bond, , 2006.

. Étude-des-frameworks, . Uima, . Gate, and . Opennlp,

O. Etzioni, M. Banko, S. Soderland, and D. S. Weld, Open Information Extraction from the Web, Commun. ACM, vol.51, pp.68-74, 2008.

O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam, Open Information Extraction : The Second Generation, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp.3-10, 2011.

A. Fader, S. Soderland, and O. Etzioni, Identifying Relations for Open Information Extraction, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1535-1545, 2011.

R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren, Spanners : A Formal Framework for Information Extraction, Proceedings of the 32Nd Symposium on Principles of Database Systems, pp.37-48, 2013.

C. Fellbaum, WordNet An Electronic Lexical Database, 1998.

D. Ferrucci and A. Lally, UIMA : an architectural approach to unstructured information processing in the corporate research environment, Nat. Lang. Eng, vol.10, pp.327-348, 2004.

A. Finn and N. Kushmerick, Active learning selection strategies for information extraction, Proceedings of the Workshop on Adaptive Text Extraction and Mining, 2003.

A. Finn and N. Kushmerick, Multi-level Boundary Classification for Information Extraction, ECML'04, pp.111-122, 2004.

D. Freitag and N. Kushmerick, Boosted Wrapper Induction, AAAI/IAAI, pp.577-583, 2000.

D. Freitag and A. Mccallum, AAAI/IAAI, pp.584-589

D. Freitag and A. K. Mccallum, Information Extraction with HMMs and Shrinkage, Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pp.31-36, 1999.

J. Fürnkranz, Separate-and-Conquer Rule Learning, Artif. Intell. Rev, vol.13, pp.3-54, 1999.

R. Gaizauskas, G. Demetriou, and K. Humphreys, Term recognition and classification in biological science journal articles, Proc. of the computional terminology for medical and biological applications workshop of the 2 nd international conference on nlp, pp.37-44, 2000.

J. J. Gardner and L. Xiong, HIDE : An Integrated System for Health Information DE-identification, Proceedings of the Twenty-First IEEE International Symposium on Computer-Based Medical Systems, pp.254-259, 2008.

W. Golik, R. Bossy, Z. Ratkovic, and N. Claire, Improving Term Extraction with Linguistic Analysis in the Biomedical Domain, Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'13), pp.24-30, 2013.

R. Grishman, The TIPSTER Text Phase II Architecture Design, Version 2.2 (Rapport technique), 1996.

R. Grishman, Information Extraction :Capabilities and Challenges, 2012.

T. R. Gruber, A Translation Approach to, Portable Ontology Specifications. Knowledge Acquisition, vol.5, pp.199-220, 1993.

T. Hamon and S. Aubin, Improving Term Extraction with Terminological Resources, FinTAL '06, pp.380-387, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00091444

J. Heer, S. K. Card, and J. A. Landay, Prefuse : A Toolkit for Interactive Information Visualization, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp.421-430, 2005.

B. W. Heller, P. H. Veltink, N. J. Rijkho, W. L. Rutten, and B. J. Andrews, Reconstructing muscle activation during normal walking : a comparison of symbolic and connectionist machine learning techniques, Biological Cybernetics, vol.69, pp.327-335, 1993.

J. R. Hobbs, J. Bear, D. Israel, and M. Tyson, FASTUS : A finite-state processor for information extraction from real-world text, pp.1172-1178, 1993.

J. R. Hobbs and E. Rilo, Information Extraction, Handbook of Natural Language Processing, 2010.

C. Hsu and M. Dung, Generating finite-state transducers for semistructured data extraction from the Web, Inf. Syst, vol.23, pp.521-538, 1998.

J. Huysmans, B. Baesens, and J. Vanthienen, Using Rule Extraction to Improve the Comprehensibility of Predictive Models, 2006.

H. Isozaki and H. Kazawa, E cient Support Vector Classifiers for Named Entity Recognition, Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), pp.390-396, 2002.

A. Jain, P. Ipeirotis, and L. Gravano, Building Query Optimizers for Information Extraction : The SQoUT Project, SIGMOD Rec, vol.37, pp.28-34, 2009.

T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu, Avatar Information Extraction System, IEEE Data Eng. Bull, vol.29, pp.40-48, 2006.

J. Jiang, Information Extraction from Text, Mining Text Data, pp.11-41, 2012.

M. P. Jones and J. H. Martin, Contextual spelling correction using latent semantic analysis, Proceedings of the fifth conference on applied natural language processing, pp.166-173, 1997.

R. Jones, R. Ghani, T. Mitchell, and E. Rilo, Active Learning for Information Extraction with Multiple View Feature Sets, Proceedings of the ECML-2004 workshop on Adaptive Text Extraction and Mining (ATEM-2003), 2003.

N. Kambhatla, Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations, Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, 2004.

J. Kazama, T. Makino, Y. Ohta, and J. Tsujii, Tuning support vector machines for biomedical named entity recognition, Proceedings of the acl-02 workshop on natural language processing in the biomedical domain -volume, vol.3, pp.1-8, 2002.

J. Kim and D. I. Moldovan, Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction, IEEE Trans. on Knowl. and Data Eng, vol.7, pp.713-724, 1995.

P. Kluegl, M. Atzmueller, T. Hermann, and F. Puppe, A Framework for Semi-Automatic Development of Rule-based Information Extraction Applications, Proc. LWA 2009 (KDML -Special Track on Knowledge Discovery and Machine Learning), pp.56-59, 2009.

P. Kluegl, M. Atzmueller, and F. Puppe, TextMarker : A Tool for Rule-Based Information Extraction, Proceedings of the Biennial GSCL Conference, pp.233-240, 2009.

V. Krishnan and C. D. Manning, An E ective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp.1121-1128, 2006.

T. Kristjansson, A. Culotta, P. Viola, and A. Mccallum, Interactive Information Extraction with Constrained Conditional Random Fields, Proceedings of the 19th National Conference on Artifical Intelligence, pp.412-418, 2004.

N. Kushmerick, D. Weld, and B. Doorenbos, Wrapper induction for information extraction, Proc. Int. Joint Conf. Artificial Intelligence, 1997.

J. D. La-erty, A. Mccallum, and F. C. Pereira, Conditional Random Fields : Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, pp.282-289, 2001.

T. Leek, Information Extraction Using Hidden Markov Models (Mémoire de Master non publié), 1997.

W. G. Lehnert, C. Cardie, D. Fisher, E. Rilo, and R. Williams, University of Massachusetts : description of the CIRCUS system as used for MUC-3, MUC, pp.223-233, 1991.

B. Lemaire, Limites de la lemmatisation pour l'extraction de significations, 9e Journées internationales d'Analyse Statistique des Données Textuelles, pp.725-732, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00385750

D. D. Lewis and J. Catlett, Heterogeneous uncertainty sampling for supervised learning, Proceedings of ICML-94, 11th International Conference on Machine Learning, pp.148-156, 1994.

Y. Li and K. Bontcheva, Hierarchical, Perceptron-like Learning for Ontologybased Information Extraction, Proceedings of the 16th International Conference on World Wide Web, pp.777-786, 2007.

Y. Li, L. Chiticariu, H. Yang, F. R. Reiss, and A. Carreno-fuentes, WizIE : A Best Practices Guided Development Environment for Information Extraction, Proceedings of the ACL 2012 System Demonstrations, pp.109-114, 2012.

Y. Li, V. Chu, S. Blohm, H. Zhu, and H. Ho, Facilitating Pattern Discovery for Relation Extraction with Semantic-signature-based Clustering, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp.1415-1424, 2011.

Y. Li, F. Reiss, and L. Chiticariu, SystemT : A Declarative Information Extraction System, ACL (System Demonstrations), pp.109-114, 2011.

H. Liu, T. Christiansen, W. A. Baumgartner, and K. Verspoor, BioLemmatizer : a lemmatization tool for morphological processing of biomedical text, Journal of biomedical semantics, p.3, 2012.

Y. Ma, L. Audibert, and A. Nazarenko, Ontologies étendues pour l'annotation sémantique, 20es Journées Francophones d'Ingénierie des Connaissances, pp.205-216, 2009.

A. Maedche, G. Neumann, and S. Staab, Intelligent Exploration of the Web, pp.345-359, 2003.

J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel, Performance Measures For Information Extraction, Proceedings of DARPA Broadcast News Workshop, pp.249-252, 1999.

D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks, Named Entity Recognition from Diverse Text Types, Proceedings of the Recent Advances in Natural Language Processing 2001 Conference, pp.257-274, 2001.

A. Mccallum, D. Freitag, and F. C. Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation, Proceedings of the Seventeenth International Conference on Machine Learning, pp.591-598, 2000.

A. Mccallum and W. Li, Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp.188-191, 2003.

L. Mcdowell, Ontology-driven information extraction with ontosyphon, International Semantic Web Conference, 2006.

G. A. Miller, WordNet : a lexical database for English, Commun. ACM, vol.38, pp.39-41, 1995.

I. Muslea, S. Minton, and C. A. Knoblock, Selective Sampling With Redundant Views, AAAI/IAAI, pp.621-626, 2000.

I. Muslea, S. Minton, and C. A. Knoblock, Active + Semi-supervised Learning = Robust Multi-View Learning, Proceedings of the Nineteenth International Bibliographie Conference on Machine Learning, pp.435-442, 2002.

C. Nédellec and A. Nazarenko, Ontology and Information Extraction : a necessary symbiosis, Ontology Design and Population, pp.155-170, 2005.

C. Nédellec, A. Nazarenko, and R. Bossy, Information Extraction, Handbook on Ontologies, pp.663-685, 2009.

C. N. Nigel, N. Collier, and J. Tsujii, Automatic term identification and classification in biology texts, Proc. of the 5th nlprs, pp.369-374, 1999.

F. Olsson, A literature survey of active machine learning in the context of natural language processing, 2009.

F. Papazian, R. Bossy, and C. Nédellec, AlvisAE : A Collaborative Web Text Annotation Editor for Knowledge Acquisition, Proceedings of the Sixth Linguistic Annotation Workshop, pp.149-152, 2012.

F. Peng, F. Feng, and A. Mccallum, Chinese Segmentation and New Word Detection Using Conditional Random Fields, Proceedings of the 20th International Conference on Computational Linguistics, 2004.

G. Petasis, V. Karkaletsis, G. Paliouras, I. Androutsopoulos, and C. D. Spyropoulos, Ellogon : A New Text Engineering Platform, Proceedings of the 3rd International Conference on Language Resources and Evaluation, pp.72-78, 2002.

J. Piskorski and R. Yangarber, Information Extraction : Past, Present and Future, Multi-source, Multilingual Information Extraction and Summarization, pp.23-49, 2013.

R. Polikar, L. Udpa, S. Udpa, S. Member, S. Member et al., Learn++ : An Incremental Learning Algorithm for Supervised Neural Networks, IEEE Transactions on System, Man and Cybernetics (C), Special Issue on Knowledge Management, vol.31, pp.497-508, 2001.

K. Probst and R. Ghani, Towards 'Interactive' Active Learning in Multi-view Feature Sets for Information Extraction, Proceedings of the 18th European conference on Machine Learning, pp.683-690, 2007.

L. Rabiner, A tutorial on hidden Markov models and selected applications inspeech recognition, vol.77, pp.257-286, 1989.

S. Ray and M. Craven, Representing Sentence Structure in Hidden Markov Models for Information Extraction, Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp.1273-1279, 2001.

L. Reeve and H. Han, Survey of semantic annotation platforms, Proceedings of the 2005 ACM symposium on Applied computing, pp.1634-1638, 2005.

F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, and S. Vaithyanathan, , 2008.

, An Algebraic Approach to Rule-Based Information Extraction, Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pp.933-942

D. Reymond, Dictionnaires distributionnels et étiquetage lexical de corpus, Actes des 3e Rencontres des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, pp.473-482, 2001.

E. Rilo, Automatically constructing a dictionary for information extraction tasks, Proceedings of the eleventh national conference on Artificial intelligence, pp.811-816, 1993.

E. Rilo, Automatically generating extraction patterns from untagged text, Proceedings of the thirteenth national conference on Artificial intelligence, pp.1044-1049, 1996.

N. Sager, C. Friedman, and M. S. Lyman, Medical Language Processing : Computer Management of Narrative Data, 1987.

S. Sarawagi, Information Extraction. Found. Trends databases, vol.1, pp.261-377, 2008.

A. D. Sarma, A. Jain, and D. Srivastava, I4E : interactive investigation of iterative information extraction, SIGMOD Conference, pp.795-806, 2010.

U. Schäfer, Middleware for Creating and Combining Multi-dimensional NLP Markup, Proceedings of the 5th Workshop on NLP and XML : MultiDimensional Markup in Natural Language Processing, pp.81-84, 2006.

H. Schmid, Probabilistic Part-of-Speech Tagging Using Decision Trees, International Conference on New Methods in Language Processing, pp.44-49, 1994.

B. Settles, Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp.104-107, 2004.

B. Settles, Active Learning Literature Survey, 2009.

K. Seymore, A. Mccallum, and R. Rosenfeld, Learning Hidden Markov Model Structure for Information Extraction, AAAI 99 Workshop on Machine Learning for Information Extraction, pp.37-42, 1999.

W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan, Declarative Information Extraction Using Datalog with Embedded Extraction Predicates, Proceedings of the 33rd International Conference on Very Large Data Bases, pp.1033-1044, 2007.

C. Siefkes and P. Siniakov, An Overview and Classification of Adaptive Approaches to Information Extraction, vol.3730, pp.172-212, 2005.

M. Skounakis, M. Craven, and S. Ray, Hierarchical Hidden Markov Models for Information Extraction, Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp.427-433, 2003.

L. Smith, T. Rindflesch, and W. J. Wilbur, MedPost : a part-of-speech tagger for bioMedical text, Bioinformatics, vol.20, pp.2320-2321, 2004.

S. Soderland, C. Cardie, and R. Mooney, Learning Information Extraction Rules for Semi-structured and Free Text, Machine Learning, pp.233-272, 1999.

S. Soderland, D. Fisher, J. Aseltine, and W. Lehnert, CRYSTAL Inducing a Conceptual Dictionary, Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp.1314-1319, 1995.

S. G. Soderland, Learning Text Analysis Rules for Domain-specific Natural Language Processing, 1997.

C. Sönströd, U. Johansson, and R. König, Towards a Unified View on Concept Description, DMIN, pp.59-65, 2007.

I. Spasic, F. Sarafraz, J. A. Keane, and G. Nenadic, Medication information extraction with linguistic pattern matching and semantic rules, JAMIA, vol.17, pp.532-535, 2010.

. Bibliographie,

R. Studer, V. R. Benjamins, and D. Fensel, Knowledge Engineering : Principles and Methods, Data Knowl. Eng, vol.25, pp.161-197, 1998.

A. Sun, M. Naing, E. Lim, and W. Lam, Using support vector machines for terrorism information extraction, ISI'03 : Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics, pp.1-12, 2003.

K. Takeuchi and N. Collier, Use of support vector machines in extended named entity recognition, Proceedings of the 6th conference on natural language learning -volume, vol.20, pp.1-7, 2002.

C. Thompson, M. Cali, and R. Mooney, Active Learning for Natural Language Parsing and Information Extraction, Proceedings of the International Conference on Machine Learning, pp.406-414, 1999.

J. Turmo, A. Ageno, and N. Català, Adaptive information extraction, ACM Comput. Surv, p.38, 2006.

V. N. Vapnik, The Nature of Statistical Learning Theory, 1995.

A. J. Viterbi, Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm, IEEE Transactions on Information Theory, issue.13, pp.260-269, 1967.

D. Z. Wang, E. Michelakis, M. J. Franklin, M. N. Garofalakis, and J. M. Hellerstein, Probabilistic declarative information extraction, ICDE, pp.173-176, 2010.

B. Wellner, M. Huyck, S. Mardis, J. Aberdeen, A. Morgan et al., Rapidly retargetable approaches to de-identification in medical records, Journal of the American Medical Informatics Association : JAMIA, vol.14, pp.564-573, 2007.

D. C. Wimalasuriya and D. Dou, Ontology-based information extraction : An introduction and a survey of current approaches, J. Inf. Sci, vol.36, pp.306-323, 2010.

G. Wisniewski and P. Gallinari, Relaxation Labeling for Selecting and Exploiting E ciently Non-local Dependencies in Sequence Labeling, In PKDD, vol.4702, pp.312-323, 2007.

F. Wu, R. Ho-mann, and D. S. Weld, Information Extraction from Wikipedia : Moving Down the Long Tail, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.731-739, 2008.

T. Wu and W. M. Pottenger, A Semi-Supervised Active Learning Algorithm for Information Extraction from Textual Data, JASIST, vol.56, pp.258-271, 2005.

J. Xu and W. B. Croft, Corpus-based stemming using cooccurrence of word variants, ACM Trans. Inf. Syst, vol.16, issue.1, pp.61-81, 1998.

R. Yangarber, Scenario Customization for Information Extraction, 2001.

R. Yangarber, Counter-training in discovery of semantic patterns, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp.343-350, 2003.

R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen, Automatic acquisition of domain knowledge for Information Extraction, Proceedings of the 18th conference on Computational linguistics, pp.940-946, 2000.

D. Yarowsky, One sense per collocation, Proceedings of the workshop on human language technology, pp.266-271, 1993.

A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni et al., TextRunner : Open Information Extraction on the Web, Proceedings of Human Language Technologies : The Annual Conference of the North American Chapter of the Association for Computational Linguistics : Demonstrations, pp.25-26, 2007.

B. Yildiz and S. Miksch, ontoX -A Method for Ontology-Driven Information Extraction, ICCSA (3), pp.660-673, 2007.

H. Zaragoza and P. Gallinari, Coupled Hierarchical IR and Stochastic Models for Surface Information Extraction, BCS-IRSG Annual Colloquium on IR Research, 1998.
URL : https://hal.archives-ouvertes.fr/hal-01617393

G. Zhou and J. Su, Named entity recognition using an hmm-based chunk tagger, Proceedings of the 40th annual meeting on association for computational linguistics, pp.473-480, 2002.

Z. Zhou and Y. Jiang, Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble, IEEE Transactions on Information Technology in Biomedicine, vol.7, pp.37-42, 2003.