algorithme recommence avec les exemples non couverts, et ce jusqu'` a ce que tous ceux de l'ensemble d'apprentissage soient couverts. L'ensemble de r` egles ainsi produit constitue l'extracteur induit. Les résultats expérimentaux de [81, 82] sont comparables aux autres approches et meilleurs sur les corpus comportant des valeurs manquantes, ContrairementàContrairementà wien et stalker ,
De plus il g` eré egalement les valeurs manquantes Cependant la principale critique qu'on peut lui adresser est que le coût algorithmique pour déterminer si une r` egle extrait un n-uplet ou pas estélevéestélevé. L'application d'une r` egle d'extraction est réaliséè a l'aide du test de ?-subsomption. Or il est bien connu en programmation logique inductive que ce test est NP-complet. En effet ,
Querying semi-structured data, ICDT, pp.1-18, 1997. ,
DOI : 10.1007/3-540-62222-5_33
Extracting structured data from Web pages, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.337-348, 2003. ,
DOI : 10.1145/872757.872799
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.9515
Visual web information extraction with lixto, 28th International Conference on Very Large Data Bases, pp.119-128, 2001. ,
The Semantic Web, Scientific American, vol.284, issue.5, 2001. ,
DOI : 10.1038/scientificamerican0501-34
Transductions and Context-Free Languages, Teubner Studienbucher, 1979. ,
DOI : 10.1007/978-3-663-09367-1
URL : https://hal.archives-ouvertes.fr/hal-00619779
Table extraction using spatial reasoning on the css2 visual box model, 2006. ,
A simple rule-based part of speech tagger, Proceedings of the third conference on Applied natural language processing, pp.152-155, 1992. ,
Semistructured data, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems , PODS '97, pp.117-121, 1997. ,
DOI : 10.1145/263661.263675
A critical survey of the methodology for ie evaluation, Proceedings of LREC 2004, 2004. ,
Relational learning techniques for natural language information extraction, 1998. ,
Inférence de requêtes dans les arbres et applicationsàapplicationsà l'extraction d'informations sur le Web, 2005. ,
Interactive learning of node selecting tree transducer, IJCAI Workshop on Grammatical Inference, 2005. ,
DOI : 10.1007/s10994-006-9613-8
Interactive learning of node selecting tree transducer, Machine Learning, pp.33-67, 2007. ,
DOI : 10.1007/s10994-006-9613-8
IEPAD, Proceedings of the tenth international conference on World Wide Web , WWW '01, 2001. ,
DOI : 10.1145/371920.372182
Wrapper Generation via Grammar Induction, ECML, pp.96-108, 2000. ,
DOI : 10.1007/3-540-45164-1_11
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.7090
A maximum entropy approach to information extraction from semi-structured and free text, Proceedings of Eighteenth national conference on Artificial intelligence, pp.786-791, 2002. ,
The Handbook of Artificial Intelligence, 1982. ,
Web Document Analysis : Challenges and Opportunities, chapter A Flexible Learning System for Wrapping Tables and Lists in HTML Documents, 2003. ,
Apprentissage artificiel ; concepts et algorithmes. Eyrolles, 2002. ,
Roadrunner : Towards automatic data extraction from large web sites, Proceedings of 27th International Conference on Very Large Data Bases, pp.109-118, 2001. ,
A survey pf semi-automatic extraction and transformation, 1994. ,
Comment améliorer l'apprentissage en utilisant des exemples positifs et des exemples nonétiquetésnonétiquetés, CAP 99, pp.133-144, 1999. ,
Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, vol.10, issue.7, pp.1895-1923, 1998. ,
Learning to map between ontologies on the semantic web, Proceedings of the eleventh international conference on World Wide Web , WWW '02, pp.662-673, 2002. ,
DOI : 10.1145/511446.511532
Multi-level Boundary Classification for Information Extraction, Proceedings of the European Conference on Machine Learning, 2004. ,
DOI : 10.1007/978-3-540-30115-8_13
Database techniques for the World-Wide Web, ACM SIGMOD Record, vol.27, issue.3, 1998. ,
DOI : 10.1145/290593.290605
Boosted wrapper induction, AAAI/IAAI, pp.577-583, 2000. ,
Information extraction with hmms and shrinkage A decision-theoretic generalization of on-line learning and an application to boosting, Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction Proceedings of the 2nd European Conference on Computational Learning Theory, pp.23-37, 1995. ,
Boosting a weak learning algorithm by majority, Proceedings of the Third Annual Workshop on Computational Learning Theory, pp.202-216, 1990. ,
Experiments with a new boosting algorithm, Proc. 13th International Conference on Machine Learning, pp.148-146, 1996. ,
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997. ,
DOI : 10.1006/jcss.1997.1504
Statistical classification for wrapper induction, Dagstuhl Seminar : Machine Learning for the Semantic Web, 2005. ,
Extraction de relations dans les documents web, Revue RNTI -Actes de EGC'06, pp.415-420, 2006. ,
Interactive Tuples Extraction from Semi-Structured Data, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06), pp.997-1004, 2006. ,
DOI : 10.1109/WI.2006.102
URL : https://hal.archives-ouvertes.fr/inria-00581253
The Lixto data extraction project -back and forth between theory and practice, 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Database Systems, pp.1-12 ,
Tree-pattern generalization for information extraction from the web, GRAPPA Report, 2005. ,
Context Generalization for Information Extraction from the Web, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), pp.720-723, 2004. ,
DOI : 10.1109/WI.2004.10076
Fast Algorithms for Finding Nearest Common Ancestors, SIAM Journal on Computing, vol.13, issue.2, pp.338-355, 1984. ,
DOI : 10.1137/0213024
Generating finite-state transducers for semi-structured data extraction from the Web, Information Systems, vol.23, issue.8, pp.521-538, 1998. ,
DOI : 10.1016/S0306-4379(98)00027-1
Grouping extracted fields, Proceedings of IJCAI- 2001 Workshop on Adaptive Text Extraction and Mining, 2001. ,
Transformations d'Arbres XML avec des Modèles Probabilistes pour l'Annotation, 2007. ,
Champs conditionnels aléatoires pour l'annotation d'arbres, 8` eme Conférence francophone sur l'Apprentissage automatique (CAp'2006), pp.171-186, 2006. ,
Conditional random fields for xml trees, ECML Workshop on Mining and Learning in Graphs, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00118761
Option decision trees with majority votes, Proc. 14th International Conference on Machine Learning, pp.161-169, 1997. ,
Information extraction from web documents based on local unranked tree automaton inference, 18th International Joint Conference on Artificial Intelligence, pp.403-408, 2003. ,
Instance-based wrapper induction, Proceedings of the Tenth Belgian-Dutch Conference on Machine Learning, pp.61-68, 2000. ,
Interactive information extraction with constrained conditional random fields, Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004. ,
Wrapper Induction for Information Extraction, 1997. ,
Wrapper induction: Efficiency and expressiveness, Artificial Intelligence, vol.118, issue.1-2, pp.15-68, 2000. ,
DOI : 10.1016/S0004-3702(99)00100-9
URL : http://doi.org/10.1016/s0004-3702(99)00100-9
Finite-State Approaches to Web Information Extraction, Proc. 3rd Summer Convention on Information Extraction, 2002. ,
DOI : 10.1007/978-3-540-45092-4_4
DEByE ??? Data Extraction By Example, Data & Knowledge Engineering, vol.40, issue.2, 2001. ,
DOI : 10.1016/S0169-023X(01)00047-7
A brief survey of web data extraction tools, ACM SIGMOD Record, vol.31, issue.2, pp.84-93, 2002. ,
DOI : 10.1145/565117.565137
Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples, International Colloquium on Grammatical Inference, pp.253-267, 2006. ,
DOI : 10.1007/11872436_21
URL : https://hal.archives-ouvertes.fr/inria-00088077
Automatic data extraction from lists and tables in web sources, Proceedings of Automatic Text Extraction and Mining workshop (ATEM-01), IJCAI-01, 2001. ,
Learning From Positive and Unlabeled Examples, ALT'00, Eleventh International Conference on Algorithmic Learning Theory, pp.71-85, 2000. ,
DOI : 10.1007/3-540-40992-0_6
URL : https://hal.archives-ouvertes.fr/inria-00538887
XWRAP : An XML-enabled wrapper construction system for web information sources Classer pour extraire : représentations et méthodes, ICDE, pp.611-621, 2000. ,
Codages et connaissances en extraction d'information, 6ì eme Conférence francophone sur l'apprentissage automatique, pp.207-222, 2004. ,
Machine Learning, 1997. ,
Generalization as search, Artif. Intell, vol.18, issue.2, pp.203-226, 1982. ,
Inductive Logic Programming: Theory and methods, The Journal of Logic Programming, vol.19, issue.20, pp.629-679, 1994. ,
DOI : 10.1016/0743-1066(94)90035-3
URL : http://doi.org/10.1016/0743-1066(94)90035-3
Hierarchical wrapper induction for semistructured information sources, Autonomous Agents and Multi-Agent Systems, vol.4, issue.1/2, pp.93-114, 2001. ,
DOI : 10.1023/A:1010022931168
Inference of recognizable tree sets, Departamento de Sistemas Informáticos y Computación, 1993. ,
Information extraction : Towards scalable, adaptable systems, In Lecture Notes in Artificial Intelligence, vol.1714, 1997. ,
DOI : 10.1007/3-540-48089-7
Table extraction using conditional random fields, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, pp.235-242, 2003. ,
DOI : 10.1145/860435.860479
A note on inductive generalization, Machine Intelligence, pp.153-165, 1970. ,
Extraction automatique d'information. Hermès, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-00098054
C4.5 : Programs for Machine Learning, 1993. ,
Inferring decision trees using the minimum description lenght principle, Information and Computation, vol.80, issue.3, pp.227-248, 1987. ,
DOI : 10.1016/0890-5401(89)90010-2
Data mining tools see5 and c5, 2004. ,
Building intelligent Web applications using lightweight wrappers, Data & Knowledge Engineering, vol.36, issue.3, pp.283-316, 2001. ,
DOI : 10.1016/S0169-023X(00)00051-3
Semi-markov conditional random fields for information extraction, Proceedings of NIPS, pp.1185-1192, 2004. ,
The Boosting Approach to Machine Learning: An Overview, Proc. MSRI Workshop on Nonlinear Estimation and Classification, 2002. ,
DOI : 10.1007/978-0-387-21579-2_9
Improved boosting algorithms using confidence-rated predictions, Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98), pp.80-91, 1998. ,
Learning information extraction rules for semi-structured and free text, Machine Learning, pp.233-272, 1999. ,
Tree-Structured Conditional Random Fields for Semantic Annotation, The Semantic Web -ISWC 2006, pp.640-653, 2006. ,
DOI : 10.1007/11926078_46
Generalized finite automata theory with an application to a decision problem of second-order logic, Mathematical System Theory, pp.57-82, 1968. ,
DOI : 10.1007/BF01691346
Bottom-Up Learning of Logic Programs for Information Extraction from Hypertext Documents, proceedings of European Conference on Machine Learning / Principles and Practice of Knowledge Discovery in Databases ECML/PKDD 2003, 2003. ,
DOI : 10.1007/978-3-540-39804-2_39
Machine Learning of Information Extraction Procedures -An ILP Approach, 2005. ,
Boosting de moindres généralisés, 6ì eme Conférence francophone sur l'apprentissage automatique, pp.49-64, 2004. ,
GlOBOOST. Combinaisons de moindres g??n??ralis??s, Revue d'intelligence artificielle, vol.19, issue.4-5, pp.769-797, 2005. ,
DOI : 10.3166/ria.19.769-797
Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm, Artificial Intelligence in Medicine, vol.4, pp.419-430, 1992. ,
Xml schema matching & xml data migration & integration : A step towards the semantic web vision, 2003. ,
Extracting web data using instance-based learning, Proceedings of Web Information Systems Engineering WISE, pp.318-331, 2005. ,
Web data extraction based on partial tree alignment, Proceedings of the 14th international conference on World Wide Web , WWW '05, pp.76-85, 2005. ,
DOI : 10.1145/1060745.1060761