. L. Des-r-`-egles, algorithme recommence avec les exemples non couverts, et ce jusqu'` a ce que tous ceux de l'ensemble d'apprentissage soient couverts. L'ensemble de r` egles ainsi produit constitue l'extracteur induit. Les résultats expérimentaux de [81, 82] sont comparables aux autres approches et meilleurs sur les corpus comportant des valeurs manquantes, ContrairementàContrairementà wien et stalker

. Les-n-composantes-ne-respectent-pas-toujours-le-même-ordre, De plus il g` eré egalement les valeurs manquantes Cependant la principale critique qu'on peut lui adresser est que le coût algorithmique pour déterminer si une r` egle extrait un n-uplet ou pas estélevéestélevé. L'application d'une r` egle d'extraction est réaliséè a l'aide du test de ?-subsomption. Or il est bien connu en programmation logique inductive que ce test est NP-complet. En effet

S. Abiteboul, Querying semi-structured data, ICDT, pp.1-18, 1997.
DOI : 10.1007/3-540-62222-5_33

A. Arasu and H. Garcia-molina, Extracting structured data from Web pages, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.337-348, 2003.
DOI : 10.1145/872757.872799

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

R. Baumgartner, S. Flesca, and G. Gottlob, Visual web information extraction with lixto, 28th International Conference on Very Large Data Bases, pp.119-128, 2001.

T. Berners-lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American, vol.284, issue.5, 2001.
DOI : 10.1038/scientificamerican0501-34

J. Berstel, Transductions and Context-Free Languages, Teubner Studienbucher, 1979.
DOI : 10.1007/978-3-663-09367-1

URL : https://hal.archives-ouvertes.fr/hal-00619779

P. Bohunsky and W. Gatterbauer, Table extraction using spatial reasoning on the css2 visual box model, 2006.

E. Brill, A simple rule-based part of speech tagger, Proceedings of the third conference on Applied natural language processing, pp.152-155, 1992.

P. Buneman, Semistructured data, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems , PODS '97, pp.117-121, 1997.
DOI : 10.1145/263661.263675

M. E. Califf, F. Ciravegna, D. Freitag, C. Giuliano, N. Kushmerick et al., A critical survey of the methodology for ie evaluation, Proceedings of LREC 2004, 2004.

M. E. Califf, Relational learning techniques for natural language information extraction, 1998.

J. Carme, Inférence de requêtes dans les arbres et applicationsàapplicationsà l'extraction d'informations sur le Web, 2005.

J. Carme, R. Gilleron, A. Lemay, and J. Niehren, Interactive learning of node selecting tree transducer, IJCAI Workshop on Grammatical Inference, 2005.
DOI : 10.1007/s10994-006-9613-8

J. Carme, R. Gilleron, A. Lemay, and J. Niehren, Interactive learning of node selecting tree transducer, Machine Learning, pp.33-67, 2007.
DOI : 10.1007/s10994-006-9613-8

C. Chang and S. Lui, IEPAD, Proceedings of the tenth international conference on World Wide Web , WWW '01, 2001.
DOI : 10.1145/371920.372182

B. Chidlovskii, J. Ragetli, and M. De-rijke, Wrapper Generation via Grammar Induction, ECML, pp.96-108, 2000.
DOI : 10.1007/3-540-45164-1_11

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Leong, C. , and H. Ng, A maximum entropy approach to information extraction from semi-structured and free text, Proceedings of Eighteenth national conference on Artificial intelligence, pp.786-791, 2002.

P. R. Cohen and E. A. Feigenbaum, The Handbook of Artificial Intelligence, 1982.

W. Cohen, M. Hurst, and L. Jensen, Web Document Analysis : Challenges and Opportunities, chapter A Flexible Learning System for Wrapping Tables and Lists in HTML Documents, 2003.

A. Cornuéjols and L. Miclet, Apprentissage artificiel ; concepts et algorithmes. Eyrolles, 2002.

G. Valter-crescenzi, P. Mecca, and . Merialdo, Roadrunner : Towards automatic data extraction from large web sites, Proceedings of 27th International Conference on Very Large Data Bases, pp.109-118, 2001.

A. Crespo, J. Jannink, E. Neuhold, M. Rys, and R. Studer, A survey pf semi-automatic extraction and transformation, 1994.

F. Decomité, F. Denis, R. Gilleron, and F. Letouzey, Comment améliorer l'apprentissage en utilisant des exemples positifs et des exemples nonétiquetésnonétiquetés, CAP 99, pp.133-144, 1999.

G. Thomas and . Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, vol.10, issue.7, pp.1895-1923, 1998.

A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy, Learning to map between ontologies on the semantic web, Proceedings of the eleventh international conference on World Wide Web , WWW '02, pp.662-673, 2002.
DOI : 10.1145/511446.511532

A. Finn and N. Kushmerick, Multi-level Boundary Classification for Information Extraction, Proceedings of the European Conference on Machine Learning, 2004.
DOI : 10.1007/978-3-540-30115-8_13

D. Florescu, A. Y. Levy, and A. O. Mendelzon, Database techniques for the World-Wide Web, ACM SIGMOD Record, vol.27, issue.3, 1998.
DOI : 10.1145/290593.290605

D. Freitag and N. Kushmerick, Boosted wrapper induction, AAAI/IAAI, pp.577-583, 2000.

D. Freitag, A. Freund, and R. E. Schapire, Information extraction with hmms and shrinkage A decision-theoretic generalization of on-line learning and an application to boosting, Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction Proceedings of the 2nd European Conference on Computational Learning Theory, pp.23-37, 1995.

Y. Freund, Boosting a weak learning algorithm by majority, Proceedings of the Third Annual Workshop on Computational Learning Theory, pp.202-216, 1990.

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proc. 13th International Conference on Machine Learning, pp.148-146, 1996.

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

R. Gilleron, P. Marty, M. Tommasi, and F. Torre, Statistical classification for wrapper induction, Dagstuhl Seminar : Machine Learning for the Semantic Web, 2005.

R. Gilleron, P. Marty, M. Tommasi, and F. Torre, Extraction de relations dans les documents web, Revue RNTI -Actes de EGC'06, pp.415-420, 2006.

R. Gilleron, P. Marty, M. Tommasi, and F. Torre, Interactive Tuples Extraction from Semi-Structured Data, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06), pp.997-1004, 2006.
DOI : 10.1109/WI.2006.102

URL : https://hal.archives-ouvertes.fr/inria-00581253

G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, and S. Flesca, The Lixto data extraction project -back and forth between theory and practice, 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Database Systems, pp.1-12

B. Habegger, Tree-pattern generalization for information extraction from the web, GRAPPA Report, 2005.

B. Habegger and M. Quafafou, Context Generalization for Information Extraction from the Web, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), pp.720-723, 2004.
DOI : 10.1109/WI.2004.10076

D. Harel and R. E. Tarjan, Fast Algorithms for Finding Nearest Common Ancestors, SIAM Journal on Computing, vol.13, issue.2, pp.338-355, 1984.
DOI : 10.1137/0213024

C. Hsu and M. Dung, Generating finite-state transducers for semi-structured data extraction from the Web, Information Systems, vol.23, issue.8, pp.521-538, 1998.
DOI : 10.1016/S0306-4379(98)00027-1

S. Lee, W. Jensen, and . Cohen, Grouping extracted fields, Proceedings of IJCAI- 2001 Workshop on Adaptive Text Extraction and Mining, 2001.

F. Jousse, Transformations d'Arbres XML avec des Modèles Probabilistes pour l'Annotation, 2007.

F. Jousse, R. Gilleron, I. Tellier, and M. Tommasi, Champs conditionnels aléatoires pour l'annotation d'arbres, 8` eme Conférence francophone sur l'Apprentissage automatique (CAp'2006), pp.171-186, 2006.

F. Jousse, R. Gilleron, I. Tellier, and M. Tommasi, Conditional random fields for xml trees, ECML Workshop on Mining and Learning in Graphs, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00118761

R. Kohavi and C. Kunz, Option decision trees with majority votes, Proc. 14th International Conference on Machine Learning, pp.161-169, 1997.

R. Kosala, M. B. Van-den-bussche, and H. Blockeel, Information extraction from web documents based on local unranked tree automaton inference, 18th International Joint Conference on Artificial Intelligence, pp.403-408, 2003.

R. Kosala and H. Blockeel, Instance-based wrapper induction, Proceedings of the Tenth Belgian-Dutch Conference on Machine Learning, pp.61-68, 2000.

T. Trausti, A. Kristjansson, P. Culotta, A. Viola, and . Mccallum, Interactive information extraction with constrained conditional random fields, Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004.

N. Kushmerick, Wrapper Induction for Information Extraction, 1997.

N. Kushmerick, Wrapper induction: Efficiency and expressiveness, Artificial Intelligence, vol.118, issue.1-2, pp.15-68, 2000.
DOI : 10.1016/S0004-3702(99)00100-9

URL : http://doi.org/10.1016/s0004-3702(99)00100-9

N. Kushmerick, Finite-State Approaches to Web Information Extraction, Proc. 3rd Summer Convention on Information Extraction, 2002.
DOI : 10.1007/978-3-540-45092-4_4

A. H. Laender and B. Ribeiro-neto, DEByE ??? Data Extraction By Example, Data & Knowledge Engineering, vol.40, issue.2, 2001.
DOI : 10.1016/S0169-023X(01)00047-7

A. H. Laender, B. Ribeiro-neto, A. S. Silva, and J. S. Teixeira, A brief survey of web data extraction tools, ACM SIGMOD Record, vol.31, issue.2, pp.84-93, 2002.
DOI : 10.1145/565117.565137

A. Lemay, J. Niehren, and R. Gilleron, Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples, International Colloquium on Grammatical Inference, pp.253-267, 2006.
DOI : 10.1007/11872436_21

URL : https://hal.archives-ouvertes.fr/inria-00088077

K. Lerman, C. A. Knoblock, and S. Minton, Automatic data extraction from lists and tables in web sources, Proceedings of Automatic Text Extraction and Mining workshop (ATEM-01), IJCAI-01, 2001.

F. Letouzey, F. Denis, and R. Gilleron, Learning From Positive and Unlabeled Examples, ALT'00, Eleventh International Conference on Algorithmic Learning Theory, pp.71-85, 2000.
DOI : 10.1007/3-540-40992-0_6

URL : https://hal.archives-ouvertes.fr/inria-00538887

L. Liu, C. Pu, and W. Han, XWRAP : An XML-enabled wrapper construction system for web information sources Classer pour extraire : représentations et méthodes, ICDE, pp.611-621, 2000.

P. Marty and F. Torre, Codages et connaissances en extraction d'information, 6ì eme Conférence francophone sur l'apprentissage automatique, pp.207-222, 2004.

T. Mitchell, Machine Learning, 1997.

M. Tom and . Mitchell, Generalization as search, Artif. Intell, vol.18, issue.2, pp.203-226, 1982.

S. Muggleton and L. De-raedt, Inductive Logic Programming: Theory and methods, The Journal of Logic Programming, vol.19, issue.20, pp.629-679, 1994.
DOI : 10.1016/0743-1066(94)90035-3

URL : http://doi.org/10.1016/0743-1066(94)90035-3

I. Muslea, S. Minton, and C. A. Knoblock, Hierarchical wrapper induction for semistructured information sources, Autonomous Agents and Multi-Agent Systems, vol.4, issue.1/2, pp.93-114, 2001.
DOI : 10.1023/A:1010022931168

J. Oncina and P. García, Inference of recognizable tree sets, Departamento de Sistemas Informáticos y Computación, 1993.

M. T. Pazienza, Information extraction : Towards scalable, adaptable systems, In Lecture Notes in Artificial Intelligence, vol.1714, 1997.
DOI : 10.1007/3-540-48089-7

D. Pinto, A. Mccallum, X. Wei, and W. B. Croft, Table extraction using conditional random fields, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, pp.235-242, 2003.
DOI : 10.1145/860435.860479

G. Plotkin, A note on inductive generalization, Machine Intelligence, pp.153-165, 1970.

T. Poibeau, Extraction automatique d'information. Hermès, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00098054

J. R. Quinlan, C4.5 : Programs for Machine Learning, 1993.

J. R. Quinlan and R. L. Rivest, Inferring decision trees using the minimum description lenght principle, Information and Computation, vol.80, issue.3, pp.227-248, 1987.
DOI : 10.1016/0890-5401(89)90010-2

]. R. Quinlan, Data mining tools see5 and c5, 2004.

A. Sahuguet and F. Azavant, Building intelligent Web applications using lightweight wrappers, Data & Knowledge Engineering, vol.36, issue.3, pp.283-316, 2001.
DOI : 10.1016/S0169-023X(00)00051-3

S. Sarawagi and W. W. Cohen, Semi-markov conditional random fields for information extraction, Proceedings of NIPS, pp.1185-1192, 2004.

R. E. Schapire, The Boosting Approach to Machine Learning: An Overview, Proc. MSRI Workshop on Nonlinear Estimation and Classification, 2002.
DOI : 10.1007/978-0-387-21579-2_9

F. Robert, Y. Schapire, and . Singer, Improved boosting algorithms using confidence-rated predictions, Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98), pp.80-91, 1998.

S. Soderland, Learning information extraction rules for semi-structured and free text, Machine Learning, pp.233-272, 1999.

J. Tang, M. Hong, J. Li, and B. Liang, Tree-Structured Conditional Random Fields for Semantic Annotation, The Semantic Web -ISWC 2006, pp.640-653, 2006.
DOI : 10.1007/11926078_46

J. W. Thatcher and J. B. Wright, Generalized finite automata theory with an application to a decision problem of second-order logic, Mathematical System Theory, pp.57-82, 1968.
DOI : 10.1007/BF01691346

B. Thomas, Bottom-Up Learning of Logic Programs for Information Extraction from Hypertext Documents, proceedings of European Conference on Machine Learning / Principles and Practice of Knowledge Discovery in Databases ECML/PKDD 2003, 2003.
DOI : 10.1007/978-3-540-39804-2_39

B. Thomas, Machine Learning of Information Extraction Procedures -An ILP Approach, 2005.

F. Torre and . Globoost, Boosting de moindres généralisés, 6ì eme Conférence francophone sur l'apprentissage automatique, pp.49-64, 2004.

F. Torre and . Globoost, GlOBOOST. Combinaisons de moindres g??n??ralis??s, Revue d'intelligence artificielle, vol.19, issue.4-5, pp.769-797, 2005.
DOI : 10.3166/ria.19.769-797

I. Geoffrey, J. W. Webb, and . Agar, Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm, Artificial Intelligence in Medicine, vol.4, pp.419-430, 1992.

L. Zamboulis, Xml schema matching & xml data migration & integration : A step towards the semantic web vision, 2003.

Y. Zhai and B. Liu, Extracting web data using instance-based learning, Proceedings of Web Information Systems Engineering WISE, pp.318-331, 2005.

Y. Zhai and B. Liu, Web data extraction based on partial tree alignment, Proceedings of the 14th international conference on World Wide Web , WWW '05, pp.76-85, 2005.
DOI : 10.1145/1060745.1060761