M. Faheem and P. Senellart, Crawl intelligent et adaptatif d???applications web pour l???archivage du web, Ing??nierie des syst??mes d'information, vol.19, issue.4, pp.61-86, 2014.
DOI : 10.3166/isi.19.4.61-86

URL : https://hal.archives-ouvertes.fr/hal-01069818

M. Faheem and P. Senellart, Demonstrating intelligent crawling and archiving of web applications, Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, CIKM '13, 2013.
DOI : 10.1145/2505515.2508197

URL : https://hal.archives-ouvertes.fr/hal-00952006

M. Faheem and P. Senellart, Intelligent and Adaptive Crawling of Web Applications for Web Archiving, Proc. ICWE, 2013.
DOI : 10.1007/978-3-642-39200-9_26

URL : https://hal.archives-ouvertes.fr/hal-00874444

M. Faheem, Intelligent crawling of Web applications for Web archiving, Proc. PhD Symposium of WWW, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00874444

M. Faheem and P. Senellart, Une dèmonstration d'un crawler intelligent pour les applications Web, Proc. BDA Demonstration .Conference without formal proceedings, 2013.

M. Faheem and P. Senellart, Collecte intelligente et adaptative d'applications Web pour l'archivage du Web, Proc. BDA Conference without formal proceedings, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00952133

M. Faheem, T. Furche, G. Grasso, and C. Schallhart, OWET: A Comprehensive Toolkit for Wrapper Induction and Scalable Data Extraction

M. Faheem and P. Senellart, Adaptive Crawling Driven by Structure-Based Link Classification
DOI : 10.1007/978-3-319-27974-9_5

D. Ahlers and S. Boll, Adaptive geospatially focused crawling, Proceeding of the 18th ACM conference on Information and knowledge management, CIKM '09, 2009.
DOI : 10.1145/1645953.1646011

D. Amitay, A. Carmel, R. Darlow, A. Lempel, and . Soffer, The connectivity sonar, Proceedings of the fourteenth ACM conference on Hypertext and hypermedia , HYPERTEXT '03, 2003.
DOI : 10.1145/900051.900060

H. Artail and K. Fawaz, A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations, Data & Knowledge Engineering, vol.66, issue.2, pp.326-337, 2008.
DOI : 10.1016/j.datak.2008.04.003

A. Arasu and H. Garcia-molina, Extracting structured data from Web pages, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, 2003.
DOI : 10.1145/872757.872799

J. Alpert and N. Hajaj, We knew the web was big, 2008.

C. [. Almpanidis, I. Kotropoulos, and . Pitas, Combining text and link analysis for focused crawling???An application for vertical search engines, Information Systems, vol.32, issue.6, pp.886-908, 2007.
DOI : 10.1016/j.is.2006.09.004

O. Gustavo, A. O. Arocena, and . Mendelzon, WebOQL: Restructuring documents, databases and Webs, Proceedings of the Fourteenth International Conference on Data Engineering, 1998.

A. Arvidson, K. Persson, and J. Mannerheim, The Kulturarw3 Project -The Royal Swedish Web Archiw3e -An example of " complete " collection of web pages, Proceedings of the 66th IFLA Council and General Conference, 2000.

[. Bertoli, V. Crescenzi, and P. Merialdo, Crawling programs for wrapper-based applications, 2008 IEEE International Conference on Information Reuse and Integration, 2008.
DOI : 10.1109/IRI.2008.4583023

[. Boldi, B. Codenotti, M. Santini, and S. Vigna, UbiCrawler: a scalable fully distributed Web crawler, Software: Practice and Experience, vol.34, issue.8, pp.711-726, 2004.
DOI : 10.1002/spe.587

[. Blanco, N. N. Dalvi, and A. Machanavajjhala, Highly efficient algorithms for structural clustering of large websites, Proceedings of the 20th international conference on World wide web, WWW '11, 2011.
DOI : 10.1145/1963405.1963468

K. Michael and . Bergman, The deep web: Surfacing hidden value, 2000.

[. Barbosa and J. Freire, Siphoning hidden-web data through keyword-based interfaces, Proceedings of the 19th Brazilian Symposium on Databases, 2004.

[. Barbosa and J. Freire, Searching for hidden-web databases, WebDB, 2005.

[. Barbosa and J. Freire, An adaptive crawler for locating hidden-Web entry points, WWW, 2007.

B. E. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, 1992.
DOI : 10.1145/130385.130401

[. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, WWW, 1998.
DOI : 10.1016/S0169-7552(98)00110-X

[. Brumfiel, The first web page, amazingly, is lost, 2013.

S. Bailey and D. Thompson, UKWAC, D-Lib Magazine, vol.12, issue.1, 2006.
DOI : 10.1045/january2006-thompson

URL : http://doi.org/10.1045/january2006-thompson

[. Burner, Crawling Towards Eternity: Building an Archive of the World Wide Web, Web Techniques Magazine, 1997.

[. Baeza-yates, C. Castillo, M. Marin, and A. Rodriguez, Crawling a country, Special interest tracks and posters of the 14th international conference on World Wide Web , WWW '05, 2005.
DOI : 10.1145/1062745.1062768

[. Bar-yossef, I. Keidar, and U. Schonfeld, Do not crawl in the dust: Different urls with similar text, WWW, 2007.

J. Cho and H. Garcia-molina, The evolution of the web and implications for an incremental crawler, VLDB, 2000.

[. Chidlovskii, Automatic repairing of web wrappers, Proceeding of the third international workshop on Web information and data management , WIDM '01, 2001.
DOI : 10.1145/502932.502938

[. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, A Survey of Web Information Extraction Systems, IEEE Transactions on Knowledge and Data Engineering, vol.18, issue.10, pp.1411-1428, 2006.
DOI : 10.1109/TKDE.2006.152

[. Chellapilla and A. Maykov, A taxonomy of JavaScript redirection spam, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web , AIRWeb '07, 2007.
DOI : 10.1145/1244408.1244423

J. Petrak, Y. Li, and W. Peters, Text Processing with GATE (Version 6) GATE Roadrunner: Towards automatic data extraction from large web sites, CMM01] Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo VLDB, 2001.

G. Valter-crescenzi, P. Mecca, and . Merialdo, Roadrunner: automatic data extraction from data-intensive web sites, SIGMOD, 2002.

P. Valter-crescenzi, P. Merialdo, and . Missier, Fine-grain Web site structure discovery, WIDM, 2003.

P. Valter-crescenzi, P. Merialdo, and . Missier, Clustering Web pages based on their structure, Data & Knowledge Engineering, vol.54, issue.3, pp.279-299, 2005.
DOI : 10.1016/j.datak.2004.11.004

S. Coleman, Blogs and the new politics of listening. The Political Quarterly, pp.272-280, 2008.

[. Chakrabarti, M. Van-den, B. Berg, and . Dom, Focused crawling: a new approach to topic-specific Web resource discovery, Computer Networks, vol.31, issue.11-16, pp.11-161623, 1999.
DOI : 10.1016/S1389-1286(99)00052-3

[. Cathro, C. Webb, and J. Whiting, Archiving the web: The pandora archive at the national library australia. National Library of Australia Staff Papers, 2009.

R. Cai, J. Yang, W. Lai, Y. Wang, and L. Zhang, iRobot, Proceeding of the 17th international conference on World Wide Web , WWW '08, 2008.
DOI : 10.1145/1367497.1367558

Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, Path sharing and predicate evaluation for high-performance XML filtering, ACM Transactions on Database Systems, vol.28, issue.4, pp.467-516, 2003.
DOI : 10.1145/958942.958947

R. [. De-bra and . Post, Information retrieval in the World-Wide Web: Making client-based searching feasible, WWW, 1994.
DOI : 10.1016/0169-7552(94)90132-5

F. Diligenti, S. Coetzee, C. L. Lawrence, M. Giles, and . Gori, Focused crawling using context graphs [dK13] Maurice de Kunder. The indexed Web, VLDB, 2000.

[. Denev, A. Mazeika, M. Spaniol, and G. Weikum, SHARC, Proc. VLDB Endow, pp.586-597, 2009.
DOI : 10.14778/1687627.1687694

URL : https://hal.archives-ouvertes.fr/hal-01122670

M. Ehrig and A. Maedche, Ontology-focused crawling of Web documents, Proceedings of the 2003 ACM symposium on Applied computing , SAC '03, 2003.
DOI : 10.1145/952532.952761

J. Elika and . Etemad, Cascading style sheets (CSS) snapshot 2007, 2008.

E. Ferrara and R. Baumgartner, Automatic Wrapper Adaptation by Tree Edit Distance Matching, Combinations of Intelligent Methods and Applications, 2010.
DOI : 10.1007/978-3-642-19618-8_3

T. Furche, G. Gottlob, G. Grasso, C. Schallhart, and A. Sellers, OXPath: A language for scalable, memory-efficient data extraction from Web applications, p.4, 2011.

T. Furche, G. Gottlob, G. Grasso, O. Gunes, X. Guo et al., DIADEM, Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, 2012.
DOI : 10.1145/2187980.2188025

[. Fetterly, M. Manasse, M. Najork, and J. Wiener, A largescale study of the evolution of web pages, WWW, 2003.

D. Freitag, Information extraction from HTML: Application of a general machine learning approach, Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998.

[. Goan, N. Benson, and O. Etzioni, A grammar inference algorithm for the world wide web, AAAI, 1996.

J. Giles, Internet encyclopaedias go head to head, Nature, vol.438, issue.7070, 2005.
DOI : 10.1038/438900a

[. Gao, H. C. Lee, and Y. Miao, Geographically focused collaborative crawling, Proceedings of the 15th international conference on World Wide Web , WWW '06, 2006.
DOI : 10.1145/1135777.1135822

[. Guo, K. Li, K. Zhang, and G. Zhang, Board Forum Crawling: A Web Crawling Method for Web Forum, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06), 2006.
DOI : 10.1109/WI.2006.52

[. Grumbach and G. Mecca, In Search of the Lost Schema, ICDT, 1999.
DOI : 10.1007/3-540-49257-7_20

[. Gomes, J. Miranda, and M. Costa, A Survey on Web Archiving Initiatives, Proceedings of the 15th International Conference on Theory and Practice of Digital Libraries: Research and Advanced Technology for Digital Libraries, 2011.
DOI : 10.1145/602421.602422

A. Pankaj-gulhane, R. Madaan, J. Mehta, R. Ramamirtham, and . Rastogi, Web-scale information extraction with vertex, ICDE, 2011.

[. Gouriten, S. Maniu, and P. Senellart, Scalable, generic, and adaptive systems for focused crawling, Proceedings of the 25th ACM conference on Hypertext and social media, HT '14, 2014.
DOI : 10.1145/2631775.2631795

URL : https://hal.archives-ouvertes.fr/hal-01069821

[. Gibson, K. Punera, and A. Tomkins, The volume and evolution of Web page templates The indexable web is more than 11.5 billion pages, WWW, 2005. External References [GS05] WWW, 2005.

G. Gouriten and P. Senellart, API Blender: A uniform interface to social platform APIs, WWW, p.2012
URL : https://hal.archives-ouvertes.fr/hal-00690621

P. Genevès and J. Vion-dury, XPath Formal Semantics and Beyond: a Coq based approach, TPHOLs, 2004.

M. Hersovici, Y. S. Jacovi, D. Maarek, M. Pelleg, S. Shtalhaim et al., The shark-search algorithm. An application: tailored Web site mapping, Computer Networks and ISDN Systems, vol.30, issue.1-7, pp.1-7317, 1998.
DOI : 10.1016/S0169-7552(98)00038-5

A. Heydon and M. Najork, Mercator: A scalable, extensible web crawler, World Wide Web, vol.2, issue.4, pp.219-229, 1999.
DOI : 10.1023/A:1019213109274

[. Halkidi and B. Nguyen, THESUS: Organizing Web document collections based on link semantics, The VLDB Journal The International Journal on Very Large Data Bases, vol.12, issue.4, pp.320-332, 2003.
DOI : 10.1007/s00778-003-0100-6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.349

K. Michiaki-iwazume, K. Shirakami, H. Hatadani, T. Takeda, and . Nishida, IICA: An Ontology-based Internet Navigation System, AAAI, 1996.

. Iso and . Iso, Information and documentation ? WARC file format, 2009.

[. Jiang, X. Song, N. Yu, and C. Lin, FoCUS, Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, 2013.
DOI : 10.1145/2187980.2187985

URL : https://hal.archives-ouvertes.fr/hal-01305623

J. Johnson, K. Tsioutsiouliklis, and C. Giles, Evolving strategies for focused web crawling, Proceedings of the 20th International Conference on Machine Learning, 2003.

E. Jupp, Obama's victory tweet 'four more years' makes history. The Independent, 2012.

[. Kolari, T. Finin, and A. Joshi, SVMs for the blogosphere: Blog identification and splog detection, AAAI, 2006.

[. Kao, S. Lin, J. Ho, and M. Chen, Moining Web informative structures and contents based on entropy analysis, IEEE Trans. Knowl. Data Eng, 2004.

W. Koehler, A longitudinal study of web pages continued: a consideration of document persistence, Inf. Res, vol.9, issue.2, 2003.

E. References, [. Kranzdorf, A. Sellers, G. Grasso, C. Schallhart et al., Visual OXPath: Robust Wrapping by Example, 2012.

[. Kranzdorf, A. J. Sellers, G. Grasso, C. Schallhart, and T. Furche, Visual oXPath, Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion
DOI : 10.1145/2187980.2188051

[. Kushmerick, Regression testing for wrapper maintenance, AAAI, 1999.

[. Kushmerick, Wrapper induction: Efficiency and expressiveness, Artificial Intelligence, vol.118, issue.1-2, pp.15-68, 2000.
DOI : 10.1016/S0004-3702(99)00100-9

[. Kushmerick, Wrapper verification, World Wide Web, vol.3, issue.2, pp.79-94, 2000.
DOI : 10.1023/A:1019229612909

[. Kushmerick, D. S. Weld, and R. Doorenbos, Wrapper induction for information extraction, IJCIA, 1997.

J. Palmieri-lage, A. S. Da-silva, P. B. Golgher, and A. H. Laender, Automatic generation of agents for collecting hidden Web pages for data extraction, Data & Knowledge Engineering, vol.49, issue.2, pp.177-196, 2004.
DOI : 10.1016/j.datak.2003.10.003

[. Liu, J. Janssen, and E. Milios, Using HMM to learn user browsing patterns for focused Web crawling, Data & Knowledge Engineering, vol.59, issue.2, pp.270-291, 2006.
DOI : 10.1016/j.datak.2006.01.012

M. Liu and T. W. Ling, A rule-based query language for HTML, DASFAA, 2001.

C. Lindemann and L. Littig, Coarse-grained classification of web sites by their structural properties, Proceedings of the eighth ACM international workshop on Web information and data management , WIDM '06, 2006.
DOI : 10.1145/1183550.1183559

C. Lindemann and L. Littig, Classifying web sites, Proceedings of the 16th international conference on World Wide Web , WWW '07, 2007.
DOI : 10.1145/1242572.1242736

[. Lee, D. Leonard, X. Wang, and D. Loguinov, Irlbot: Scaling to 6 billion pages and beyond, ACM Trans. Web, vol.38, issue.3, pp.1-8, 2009.

[. Lerman, S. N. Minton, and C. A. Knoblock, Wrapper maintenance: A machine learning approach, J. Artificial Intelligence Research, 2003.

Y. Li, X. Meng, L. Wang, and Q. Li, RecipeCrawler: Collecting Recipe Data from WWW Incrementally, Advances in Web-Age Information Management, 2006.
DOI : 10.1007/11775300_23

[. Lim and Y. Ng, An automated change-detection algorithm for HTML documents based on semantic hierarchies, ICDE, 2001. External References [LNL04] Zehua Liu, Wee Keong Ng, and Ee-Peng Lim. An automated algorithm for extracting Website skeleton DASFAA, 2004.

[. Lim, A. Sachan, L. L. Vrizlynn, and . Thing, A Lightweight Algorithm for Automated Forum Information Processing, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013.
DOI : 10.1109/WI-IAT.2013.18

[. Masanès, Web archiving, 2006.
DOI : 10.1007/978-3-540-46332-0

J. Melton and S. Buxton, Querying XML, 2006.
DOI : 10.1016/B978-155860711-8/50004-6

C. James, M. Mulvenon, and . Chase, You've Got Dissent! Chinese Dissident Use of the Internet and Beijing's Counter Strategies, 2002.

[. Meng, D. Hu, and C. Li, Schema-guided wrapper maintenance for web-data extraction, Proceedings of the fifth ACM international workshop on Web information and data management , WIDM '03, 2003.
DOI : 10.1145/956699.956701

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.5673

[. Meng, D. Hu, and C. Li, Schema-guided wrapper maintenance for web-data extraction, Proceedings of the fifth ACM international workshop on Web information and data management , WIDM '03, 2003.
DOI : 10.1145/956699.956701

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.5673

S. R. Madhavan, S. Jeffery, . Cohenluna, D. Dong, C. Ko et al., Web-scale data integration: You can only afford to pay as you go, CIDR, 2007.

D. Madhavan, L. Ko, V. Kot, A. Ganapathy, A. Rasmussen et al., Google's Deep Web crawl, Proceedings of the VLDB Endowment, vol.1, issue.2, pp.1241-1252, 2008.
DOI : 10.14778/1454159.1454163

M. [. Mohr, M. Kimpton, I. Stack, and . Ranitovic, Introduction to heritrix, an archival quality web crawler, Proceedings of the 4th International Web Archiving Workshop, 2004.

[. Muslea, S. Minton, and C. Knoblock, A hierarchical approach to wrapper induction, Proceedings of the third annual conference on Autonomous Agents , AGENTS '99, 1999.
DOI : 10.1145/301136.301191

[. Menczer, G. Pant, P. Srinivasan, and M. E. Ruiz, Evaluating topic-driven web crawlers, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, 2001.
DOI : 10.1145/383952.383995

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.9569

[. Ntoulas, J. Cho, and C. Olston, What's new on the web?, Proceedings of the 13th conference on World Wide Web , WWW '04, 2004.
DOI : 10.1145/988672.988674

M. Najork and A. Heydon, High-Performance Web Crawling, Handbook of Massive Data Sets, pp.25-45, 2002.
DOI : 10.1002/1096-9128(200005)12:6<363::AID-CPE479>3.0.CO;2-3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.2388

E. References, A. Michael, and . Nielsen, Reinventing Discovery: The New Era of Networked Science, 2011.

X. Ochoa and E. Duval, Quantitative analysis of user-generated content on the Web, WebEvolve, 2008.

[. Osuna, R. Freund, and F. Girosi, An improved training algorithm for support vector machines, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop, 1997.
DOI : 10.1109/NNSP.1997.622408

C. Olston and M. Najork, Web crawling. Found. Trends Inf, Retr, vol.4, issue.3, pp.175-246, 2010.
DOI : 10.1561/1500000017

C. Olston and S. Pandey, Recrawl scheduling based on information longevity, Proceeding of the 17th international conference on World Wide Web , WWW '08, 2008.
DOI : 10.1145/1367497.1367557

F. Vassilis-plachouras, J. Carpentier, T. Masanés, P. Risse, and . Senellart, Patrick Siehndel, and Yannis Stavrakas. An architecture for selective web harvesting: The use case of heritrix, Proceedings of the 1st International Workshop on Archiving Community Memories, 2013.

[. Pandey, K. Dhamdhere, and C. Olston, WIC, VLDB, 2004.
DOI : 10.1016/B978-012088469-8.50034-6

[. Papailiou, I. Konstantinou, D. Tsoumakos, and N. Koziris, H2RDF, Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, 2012.
DOI : 10.1145/2187980.2188058

URL : http://dspace.lib.ntua.gr/handle/123456789/36469

S. Thomas-risse, W. Dietze, K. Peters, and . Doka, Yannis Stavrakas, and Pierre Senellart. Exploiting the social and semantic web for guided web archiving, Theory and Practice of Digital Libraries, pp.426-432, 2012.

D. Raggett, A. L. Hors, and I. Jacobs, HTML 4.01 specification, 1999.

[. Pingdom, WordPress completely dominates top 100 blogs, 2012.

J. Raposo, A. Pan, M. Álvarez, and J. Hidalgo, Automatically maintaining wrappers for semi-structured web sources, Data & Knowledge Engineering, vol.61, issue.2, 2007.
DOI : 10.1016/j.datak.2006.06.006

A. Sahuguet and F. Azavant, Building light-weight wrappers for legacy Web data-sources using W4F, VLDB, 1999.

M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and P. Senellart, Data quality in web archiving, Proceedings of the 3rd workshop on Information credibility on the web, WICOW '09, 2009.
DOI : 10.1145/1526993.1526999

W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan, Declarative information extraction using Datalog with embedded extraction predicates, VLDB, 2007.

A. Sellers, T. Furche, G. Gottlob, G. Grasso, and C. Schallhart, Exploring the Web with OXPath, LWDM, 2011.

[. Sigurðsson, Incremental crawling with Heritrix, IWAW, 2005.

A. [. Sawa, S. Morishima, H. Sugimoto, and . Kitagawa, Wraplet: Wrapping Your Web Contents with a Lightweight Language, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 2007.
DOI : 10.1109/SITIS.2007.135

[. Soderland, Learning information extraction rules for semistructured and free text, Machine Learning, pp.233-272, 1999.

T. [. Shkapenyuk and . Suel, Design and implementation of a highperformance distributed web crawler, Proceedings of the 18th International Conference on Data Engineering, pp.357-368, 2002.

[. Su, D. Sun, I. Wu, and L. Chen, On design of browser-oriented data extraction system and plug-ins, JMST, vol.18, 2010.

[. Tang, D. Hawking, N. Craswell, and K. Griffiths, Focused crawling for both topical relevance and quality of medical information, Proceedings of the 14th ACM international conference on Information and knowledge management , CIKM '05, 2005.
DOI : 10.1145/1099554.1099583

. Twitter, Historical data not working, 2011.

. Vdsdmc06a, L. A. Márcio, A. Vidal, E. Soares-da-silva, J. M. Silva-de-moura et al., GoGetIt!: a tool for generating structure-driven Web crawlers, WWW, 2006.

E. References, . Vdsdmc06b, L. A. Márcio, A. Vidal, E. Soares-da-silva et al., Structure-driven crawler generation by example, SIGIR, 2006.

]. W3c07b and . W3c, XML Query (XQuery) Requirements. http://www.w3.org/TR/ xquery-requirements, 2007.

J. Wang and F. H. Lochovsky, Data-rich section extraction from HTML pages, WISE, 2002.

J. Wang and F. H. Lochovsky, Data extraction and label assignment for web databases, Proceedings of the twelfth international conference on World Wide Web , WWW '03, 2003.
DOI : 10.1145/775152.775179

J. Wang, W. Yang, R. Lai, L. Cai, W. Zhang et al., Exploring traversal strategy for web forum crawling, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, 2008.
DOI : 10.1145/1390334.1390413

Y. Xia, F. Yang, S. Ge, H. Zhang, and . Yu, Automatic wrappers generation and maintenance, PACLIC, 2011.

H. Ying and V. L. Thing, An enhanced intelligent forum crawler, 2012 IEEE Symposium on Computational Intelligence for Security and Defence Applications, 2012.
DOI : 10.1109/CISDA.2012.6291523

H. Zheng, B. Kang, and H. Kim, An ontology-based approach to learnable focused crawling, Information Sciences, vol.178, issue.23, pp.4512-4522, 2008.
DOI : 10.1016/j.ins.2008.07.030

[. Zhai and B. Liu, Web data extraction based on partial tree alignment, Proceedings of the 14th international conference on World Wide Web , WWW '05, 2005.
DOI : 10.1145/1060745.1060761

S. Zheng, R. Song, J. Wen, and D. Wu, Joint optimization of wrapper generation and template detection, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, 2007.
DOI : 10.1145/1281192.1281287