E. Dans-chaque-langue, F. Permet-de-diriger-vers-une-page-de-la-même-langue, M. Aguiar, and . Beigbeder, Des moteurs de recherche efficaces pour des systèmes hypertextes grâce aux contextes des noeuds, Colloque International : Technologies de l'Information et de la Communication dans les Enseignements d'ingénieurs et dans l'industrie, 2000.

J. J. Aberdeen, D. Burger, L. Day, P. Hirschman, M. Robinson et al., MITRE, Proceedings of the 6th conference on Message understanding , MUC6 '95, 1995.
DOI : 10.3115/1072399.1072413

[. Ahmed, S. Cha, and C. Tappert, Language identification from text using n-gram based cumulative frequency addition, CSIS Research Day, 2004.

B. Adelberg, NoDoSE -a tool for semi-automatically extracting semi-structured data from text documents, SIGMOD Conference, pp.283-294, 1998.

[. Adriani, Ambiguity problem in multilingual information retrieval, CLEF, pp.156-165, 2000.

. R. Ahb-+-95-]-e, J. R. Appelt, J. Hobbs, D. Bear, M. Israel et al., SRI international FASTUS system : MUC-6 test results and analysis, Proceedings of the Sixth Message Understanding Conference, 1995.

K. S. Azzam, R. Humphreys, H. Gaizauskas, Y. Cunningham, and . Wilks, Using a language independent domain model for multilingual information extraction, Special Issue on Multilinguality in the Software Industry : the AI Contribution (MULSAIC-97, 1999.
DOI : 10.1080/088395199117252

[. Albert, H. Jeong, and A. Barabasi, The diameter of the World Wide Web. CoRR, cond-mat, 1999.

O. Gustavo, A. O. Arocena, and . Mendelzon, WebOQL : Restructuring documents, databases, and Webs, TAPOS, vol.5, issue.3, pp.127-141, 1999.

[. Atzeni, G. Mecca, and P. Merialdo, Semistructured and structured data in the Web: going back and forth, ACM SIGMOD Record, vol.26, issue.4, pp.16-23, 1997.
DOI : 10.1145/271074.271080

]. D. App99 and . Appelt, An introduction to information extraction, Artificial Intelligence Communications, vol.12, issue.3, pp.161-172, 1999.

A. Barabasi and R. Albert, Emergence of scaling in random networks, Science, vol.286, pp.509-512, 1999.

J. Barwise, The Situation in Logic, volume 17 of CSLI Lecture Notes. Center for the Study of Language and Information Publications, 1989.

]. E. Bat92 and . Batchelder, A learning experience : Training an artificial neural network to discriminate languages, 1992.

Y. [. Bottou, G. Bengio-tesauro, D. Touretzky, and T. Leen, Convergence properties of the Kmeans algorithm, Advances in Neural Information Processing Systems 7, 1995.

]. K. Bee88 and . Beesley, Language identifier : A computer program for automatic natural-language identification on on-line text, the 29th Annual Conference of the American Translators Association, pp.47-54, 1988.

N. Bertoldi and M. Federico, ITC-irst at CLEF 2003: Monolingual, Bilingual, and Multilingual Information Retrieval, CLEF, pp.140-151, 2003.
DOI : 10.1007/978-3-540-30222-3_13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.501.1222

[. Besançon, O. Ferret, and C. Fluhr, Integrating New Languages in a Multilingual Search System Based on a Deep Linguistic Analysis, CLEF, pp.83-89, 2004.
DOI : 10.1007/11519645_8

[. Baumgartner, S. Flesca, and G. Gottlob, Supervised wrapper generation with Lixto, VLDB, pp.715-716, 2001.

[. Boyan, D. Freitag, and T. Joachims, A machine learning architecture for optimizing Web search engine, Workshop on Internet-Based Information Systems (W- AAAI'96), 1996.

K. Bharat and M. R. Henzinger, Improved algorithms for topic distillation in a hyperlinked environment, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '98, pp.104-111, 1998.
DOI : 10.1145/290941.290972

A. Broder and R. Kumar, Graph structure in the Web, Computer Networks, vol.33, issue.1-6, 2000.
DOI : 10.1016/S1389-1286(00)00083-9

[. Broekstra, A. Kampman, and F. Van-harmelen, Sesame : A generic architecture for storing and querying RDF and RDF Schema, International Semantic Web Conference, pp.54-68, 2002.

H. [. Bingham and . Mannila, Random projection in dimensionality reduction, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '01, 2001.
DOI : 10.1145/502512.502546

[. Bray, Measuring the Web, Computer Networks and ISDN Systems, vol.28, issue.7-11, pp.993-1005, 1996.
DOI : 10.1016/0169-7552(96)00061-X

]. B. Bre93 and . Breton, Histoire de l'informatique. La découverte, 1993.

S. Brin, Extracting Patterns and Relations from the World Wide Web, WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98, 1998.
DOI : 10.1007/10704656_11

R. A. Botafogo, E. Rivlin, and B. Shneiderman, Structural analysis of hypertexts: identifying hierarchies and useful metrics, ACM Transactions on Information Systems, vol.10, issue.2, pp.142-180, 1992.
DOI : 10.1145/146802.146826

M. Braschler and P. Schäuble, Multilingual Information Retrieval Based on Document Alignment Techniques, ECDL, pp.183-197, 1998.
DOI : 10.1007/3-540-49653-X_12

[. Bush, As we may think. The Atlantic Monthly, pp.101-108, 1945.

Y. [. Bontcheva and . Wilks, Automatic Report Generation from Ontologies: The MIAKT Approach, Nineth International Conference on Applications of Natural Language to Information Systems, 2004.
DOI : 10.1007/978-3-540-27779-8_28

B. Bibliographie-soumen-chakrabarti, R. Dom, P. Kumar, . Raghavan, A. Sridhar-rajagopalan et al., Empirical methods in information extraction. AI Magazine Mining the Web's link structure Bigraph and trigraph models for language identification and character recognition Towards the self-annotating Web WebQuery : Searching and visualizing the Web through connectivity IEPAD : information extraction based on pattern discovery GATE : A framework and graphical development environment for robust NLP tools and applications Automatic web information extraction in the ROADRUNNER system, Califf. Relational Learning Techniques for Natural Language Information Extraction AISB Workshop on Computational Linguistics for Speech and Handwriting Recognition WWW '04 : Proceedings of the 13th international conference on World Wide WebCL96] J. Cowie and W. Lehnert. Information extraction. Communications of the ACM WWWCLM00] Vincenza Carchiolo, Alessandro Longheu, and Michele Malgeri. Extracting logical schema from the web. In PRICAI Workshop on Text and Web MiningCM98] Valter Crescenzi and Giansalvatore Mecca. Grammars have exceptions Proceedings of the 40th Anniversary Meeting of the Association for Computational LinguisticsCMM01] Valter Crescenzi ER (Workshops), pp.60-67, 1994.

[. Christophides and A. Rizk, Querying structured documents with hypertext links using OODBMS, Proceedings of the 1994 ACM European conference on Hypermedia technology , ECHT '94, pp.186-197, 1994.
DOI : 10.1145/192757.192799

D. [. Cunningham and . Scott, Introduction to the special issue on software architecture for language engineering, Natural Language Engineering, 2004.

B. William, J. M. Cavnar, and . Trenkle, Ngram -based text categorization, Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp.161-175, 1994.

J. [. Chen and . Trenkle, An empirical study of smoothing techniques for language modeling, 1998.

]. H. Cun05 and . Cunningham, Information extraction, automatic. Encyclopedia of Language and Linguistics, 2005.

]. M. Dam95 and . Damashek, Gauging similarity with n-grams : Languageindependent categorization of text, Science, vol.267, issue.10, pp.843-848, 1995.

C. [. Declerck and . Crispi, Multilingual linguistic modules for IE systems, Proceedings of Workshop on Information Extraction for Slavonic and other Central and Eastern European Languages (IESL'03), 2003.

F. [. Dingli, Y. Ciravegna, and . Wilks, Automatic semantic annotation using unsupervised information extraction and integration, Workshop on Knowledge Markup and Semantic Annotation, 2003.

J. Domingue and M. Dzbor, Magpie, Proceedings of the 9th international conference on Intelligent user interface , IUI '04, pp.191-197, 2004.
DOI : 10.1145/964442.964479

S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha et al., SemTag and seeker, Proceedings of the twelfth international conference on World Wide Web , WWW '03, pp.178-186, 2003.
DOI : 10.1145/775152.775178

]. G. Bibliographie-[-dej82 and . Dejong, An overview of the FRUMP system, Strategies for Natural Language Processing, pp.149-176, 1982.

D. Dini, L. Liebwald, and . Mommers, Wim Peters, Erich Peters, and Wim Voermans. Cross-lingual legal information retrieval using a WordNet architecture, ICAIL, pp.163-167, 2005.

J. [. Dorogovtsev, A. N. Mendes, and . Samukhin, Structure of Growing Networks with Preferential Linking, Physical Review Letters, vol.85, issue.21, pp.4633-4636, 2000.
DOI : 10.1103/PhysRevLett.85.4633

]. T. Dun94 and . Dunning, Statistical identification of language, 1994.

D. W. Embley, D. M. Campbell, Y. S. Jiang, S. W. Liddle, Y. Ng et al., Conceptual-model-based data extraction from multiplerecord Web pages, Data Knowl. Eng, issue.3, pp.31227-251, 1999.

]. L. Eik99 and . Eikvil, Information extraction from World Wide Web -a survey, 1999.

R. Estival and J. Meyriat, La dialectique de l'´ ecrit et du document. un effort de synthèse. Schéma et schématisation, pp.82-91, 1981.

A. [. Erdos and . Renyi, On random graphs, Publicationes Mathematicae, vol.6, pp.290-297, 1959.

. Erc-+-00-]-k, V. Efe, C. Raghavan, A. Chu, L. Broadwater et al., The shape of the Web and its implications for searching the Web, 2000.

O. Etzioni, The World-Wide Web: quagmire or gold mine?, Communications of the ACM, vol.39, issue.11, pp.65-68, 1996.
DOI : 10.1145/240455.240473

C. Fellbaum, WordNet -An Electronic Lexical Database, 1998.

S. Gary-william-flake, C. L. Lawrence, F. Giles, and . Coetzee, Self-organization and identification of Web communities, Computer, vol.35, issue.3, pp.66-71, 2002.
DOI : 10.1109/2.989932

[. Fluhr, Systèmes multilingue recherche interlingue, Conférence Internationale sur le Document Electronique, 2005.

[. Fuller, E. Mackie, R. Sacks-davis, and R. Wilkinson, Structured answers for a large structured document collection [Für99] Johannes Fürnkranz Exploiting structural information for text classification on the WWW [Fre98] D. Freitag. Information extraction from HTML : Application of a general machine learning approach Multilingual sentence categorization according to language. CoRR, cmp-lg/9502039, 16ème ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93) IDA Proceesings of Fifteenth National Conference on Artificial Intelligence (AAAI- 98), 1998. [Gig95] Emmanuel GiguetGig98] Emmanuel Giguet. Méthode pour l'analyse automatique de structures formelles sur documents multilinguesGKR98] David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan. Inferring Web communities from link topology. In Hypertext, pp.204-213, 1993.

J. Guillaume and M. Latapy, The Web graph : an overview, InQuatrì emes Rencontres francophones sur les aspects algorithmiques des télécommunications (ALGOTEL'02), 2002.
URL : https://hal.archives-ouvertes.fr/hal-00016817

J. Guillaume and M. Latapy, Topologie d'Internet et du Web : mesure et modélisation, Premier colloque Mesures de l'Internet, 2003.

J. Guillaume and M. Latapy, Bipartite graphs as models of complex networks, CAAN, pp.127-139, 2004.

J. Guillaume and M. Latapy, Bipartite structure of all complex networks, Gér02] Mathias Géry. Indexation et interrogation de chemins de lecture en contexte pour la Recherche d'Information Structurée sur le Web, pp.215-221, 2002.
DOI : 10.1016/j.ipl.2004.03.007

URL : https://hal.archives-ouvertes.fr/hal-00016855

]. R. Bibliographie-[-gri95, ]. R. Grishmangri01, . [. Gruber, J. Grishman, . [. Sterling et al., Adaptive information extraction and sublanguage analysis A translation approach to portable ontologies Knowledge Acquisition Description of the Proteus system as used for MUC-5 Message Understanding Conference -6 : A brief history Autowrapper : automatic wrapper generation for multiple online services Information extraction : Beyond document retrieval Description of the lasie system as used for muc-6 Hyweb : Un système d'interrogation orienté objet pour le web Description of the FASTUS system as used for MUC-4, TIPSTER architecture design document version 2.0 (tinman architecture Proceedings of Workshop on Adaptive Text Extraction and Mining at Seventeenth International Joint Conference on Artificial Intelligence Proceedings of the Fifth Message Understanding Conference (MUC-5) Proceedings of the 16th International Conference on Computational Linguistics The Asia Pacic Web Conference Proceedings of the Sixth Message Understanding Conference BDA Proceedings of the Fourth Message Understanding Conference MUC-4, pp.199-220, 1992.

J. Hayes, Language recognition using two-and three-letter clusters, 1993.

M. [. Hsu and . Dung, Generating finite-state transducers for semistructured data extraction from the Web, Special Issue on Semistructured Data, 1998.

J. Hammer, H. Garcia-molina, S. Nestorov, R. Yerneni, M. M. Breunig et al., Template-based wrappers in the TSIMMIS system, SIG- MOD Conference, pp.532-535, 1997.

M. Maruf-hasan and Y. Matsumoto, Multilingual document alignment -a study with chinese and japanese, NLPRS, pp.617-623, 2001.

]. J. Hob91 and . Hobbs, Description of the TACITUS system as used for MUC-3, Proceedings of the Third Message Understanding Conference MUC-3, pp.200-206, 1991.

S. [. Handschuh, F. Staab, and . Ciravegna, S-CREAM -semiautomatic creation of metadata, 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), pp.358-372, 2002.

]. S. Huf95 and . Huffman, Learning information extraction patterns from examples. Workshop on new approaches to learning for natural language processing (IJCAI-95), pp.127-142, 1995.

R. Jakobson, Essais de linguistique générale, 1963.

J. Yung, J. Hsu, W. Tau, and Y. , Template-based information mining from HTML documents, AAAI/IAAI, pp.256-262, 1997.

L. [. Jacobs and . Rau, SCISOR: extracting information from on-line news, Communications of the ACM, vol.33, issue.11, pp.88-97, 1990.
DOI : 10.1145/92755.92769

]. S. Kas98 and . Kaski, Dimensionality reduction by random mapping : Fast similarity computation for clustering, International Joint Conference on Neural Networks (IJCNN'98, 1998.

H. [. Kosala and . Blockeel, Web mining research, SIGKDD Explorations, pp.1-15, 2000.
DOI : 10.1145/360402.360406

]. M. Kes63 and . Kessler, Bibliographic coupling between scientific papers, American Documentation, vol.14, pp.10-25, 1963.

W. [. Kogut and . Holmes, AeroDAML : Applying information extraction to Generate DAML Annotations from Web pages

V. Annotation, B. C. Jon, R. Kleinberg, P. Kumar, . Raghavan et al., The Web as a graph : Measurements, models, and methods, Bibliographie In First International Conference on Knowledge Capture The structure of the Web, pp.1-171849, 1999.

J. M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, vol.46, issue.5, pp.604-632, 1999.
DOI : 10.1145/324133.324140

J. M. Kleinberg, D. Kim, and . Moldovan, Acquisition of linguistic patterns for knowledge-based information extraction Self-organizing maps Random graph models for the Web graph, FOCS, pp.31713-724, 1995.

R. Kumar, P. Raghavan, D. Sridhar-rajagopalan, A. Sivakumar, E. Tomkins et al., The Web as a graph, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , PODS '00, pp.1-10, 2000.
DOI : 10.1145/335168.335170

[. Sudo, R. G. , H. Kautz, B. Selman, and M. Shah, Crosslingual information extraction system evaluation Referral Web : combining social networks and collaborative filtering Wrapper induction for information extraction, Proceedings of the Sixth Message Understanding Conference Proceedings of COLINGKus97] Nicholas Kushmerick, pp.221-23663, 1995.

A. [. Lytinen and . Gershman, Atrans : Automatic processing of money transfer messages, Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), pp.1089-1093, 1986.

J. [. Lamel and . Gauvain, Language identification using phonebased acoustic likelihoods, the IEEE International Conference on Accoustics, Speech, and Signal Processing (ICA94), 1994.

]. D. Lin95 and . Lin, Description of the PIE system as used for MUC-6, Proceedings of the Sixth Message Understanding Conference (MUC-6), pp.113-126, 1995.

E. [. Lavoie, How " World Wide " is the Web ? trend in internationalization of Web sites, Annual Review of OCLC Research, 1999.

[. Liu, C. Pu, and W. Han, XWRAP: an XML-enabled wrapper construction system for Web information sources, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), pp.611-621, 2000.
DOI : 10.1109/ICDE.2000.839475

H. F. Lrnds02-]-alberto, . Laender, A. Berthier, A. Ribeiro-neto, . Soares et al., DEByE -data extraction by example, Data Knowl. Eng, vol.40, issue.2, pp.121-154, 2002.

]. A. Lrndst02, B. A. Laender, A. S. Ribeiro-neto, J. S. Da-silva, and . Teixeira, A brief survey of Web data extraction tools, SIG- MOD Record, issue.2, pp.3184-93, 2002.

]. J. Mac67 and . Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.

[. Masche, Multilingual information extraction Master's thesis, Master's Thesis, 2004.

]. D. Bibliographie-[-may03 and . Maynard, Multi-source and multilingual information extraction, 2003.

E. Mdmld02-]-adilson, A. P. Motter, Y. De-moura, P. Lai, and . Dasgupta, Topology of the conceptual network of language, Physical Review E, issue.065102, p.65, 2002.

S. [. Muslea, C. Minton, and . Knoblock, Stalker : Learning extraction rules for semistructured, Proceedings of AAAI-98 Workshop on AI and Information Integration, 1998.

[. Muslea, S. Minton, and C. A. Knoblock, Active learning for hierarchical wrapper induction, AAAI/IAAI, p.975, 1999.

A. O. Mendelzon, G. A. Mihaila, and T. Milo, Querying the World Wide Web, Fourth International Conference on Parallel and Distributed Information Systems, pp.80-91, 1996.
DOI : 10.1109/PDIS.1996.568671

]. D. Mtb-+-03, V. Maynard, K. Tablan, H. Bontcheva, Y. Cunningham et al., MUSE : a multi-source entity recognition system, 2003.

]. S. Mus65 and . Mustonen, Multiple discriminant analysis in linguistic problems, Statistical Methods in Linguistics, vol.4, 1965.

T. Nelson, How We Think., Proceedings of Online 72
DOI : 10.2307/2179725

]. P. New87 and . Newman, Foreign language identification : First step in the translation process, the 28th Annual Conference of the American Translators Accociation, pp.509-516, 1987.

. Dang-tuan-nguyen, Nouvelle méthode syntagmatique de vectorisation appliquée au Self-organizing map des textes vietnamiens, POSTER, RECITAL (Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues), 2004.

K. Dang-tuan-nguyen and . Zreik, Multilingual hyperdocument recognition: a document mining approach, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004., 2004.
DOI : 10.1109/ICTTA.2004.1307822

K. Dang-tuan-nguyen and . Zreik, Hyperling : Système de reconnaissance et de classification des hyperdocuments multilingues, International Conference in Computer Science « Research, Innovation and Vision of the Future» (RIVF'05), 2005.

K. Dang-tuan-nguyen, ]. E. Zreikolb03, B. F. O-'neill, R. Lavoie, and . Bennett, Multilingual Web Documents: the system Hyperling, 2006 2nd International Conference on Information & Communication Technologies, 2003.
DOI : 10.1109/ICTTA.2006.1684435

A. Popov, A. Kiryakov, and . Kirilov, Dimitar Manov , Damyan Ognyanoff, and Miroslav Goranov. KIM -semantic annotation platform, International Semantic Web Conference, pp.834-849, 2003.

A. Popov, A. Kiryakov, D. Kirilov, D. Manov, M. Ognyanoff et al., KIM -semantic annotation platform Natural Language Engineering [Poi99] Thierry Poibeau. Mixing technologies for intelligent information extraction [PP04] Muntsa Padro and Lluis Padro. Comparing methods for language identification, Actes du workshop Intelligent Information Integration (III), 16th International Joint Conference on Artificial Intelligence (IJCAI'99) Procesamiento del Lenguaje NaturalPPR96] Peter Pirolli, James Pitkow, and Ramana Rao. Silk from a Sow's Ear : Extracting usable structures from the Web Proc, pp.155-162, 1999.

L. Reeve and H. Han, Survey of semantic annotation platforms, Proceedings of the 2005 ACM symposium on Applied computing , SAC '05, pp.1634-1638, 2005.
DOI : 10.1145/1066677.1067049

]. E. Farshad-riahiril93, O. Riloff-alberto, and . Mendelzon, Elaboration automatique d'une base de données donnéesà partir d'informations semi-structurées issues du Web Automatically constructing a dictionary for information extraction tasks What is this page known for ? Computing web page reputations, INFORSID Proceedings of the Eleventh Annual Conference on Artificial Intelligence, pp.327-341, 1993.

. Bibliographie-[-rnlds99, A. Berthier, A. H. Ribeiro-neto, A. Laender, . Soares et al., Extracting semi-structured data through examples, CIKM, pp.94-101, 1999.

[. Riloff, C. Schafer, and D. Yarowsky, Inducing information extraction systems for new languages via crosslanguage projection, COLING, 2002.

[. Ricca, P. Tonella, E. Pianta, and C. Girardi, Experimental results on the alignment of multilingual web sites, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings., pp.288-295, 2004.
DOI : 10.1109/CSMR.2004.1281431

D. Roth, W. Tau, and Y. , Probabilistic reasoning for entity & relation recognition, Proceedings of the 19th international conference on Computational linguistics -, pp.1-7, 2002.
DOI : 10.3115/1072228.1072379

A. Sahuguet and F. Azavant, Web ecology : Recycling HTML pages as XML documents using W4F, WebDB (Informal Proceedings), pp.31-36, 1999.

[. Salton, J. Allan, and A. Singhal, Automatic text decomposition and structuring, Information Processing & Management, vol.32, issue.2, pp.127-138, 1996.
DOI : 10.1016/S0306-4573(96)85001-1

]. B. Sch00 and . Schloman, Breaking through the foreign language barrier : Resources on the web, Online Journal of Issues in Nursing, 2000.

[. Soderland, D. Fisher, J. Aseltine, and W. G. Lehnert, Issues in inductive learning of domainspecific text extraction rules, Learning for Natural Language Processing, pp.290-301, 1995.

[. Suzuki, T. Hirao, Y. Sasaki, and E. Maeda, Hierarchical directed acyclic graph kernel, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, pp.32-39, 2003.
DOI : 10.3115/1075096.1075101

[. Small, Co-citation in the scientific literature : A new measure of the relationship between two documents. Essays of an Information Scientist, pp.28-31, 1974.

R. Sperer and D. W. Oard, Structured translation for cross-language information retrieval, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '00, pp.120-127, 2000.
DOI : 10.1145/345508.345562

[. Soderland, Learning to extract text-based information from the World Wide Web, KDD, pp.251-254, 1997.

]. S. Sod99 and . Soderland, Learning information extraction rules for semistructures and free text, Feb, 1999.

E. Spertus, . H. Parasitestr01-]-s, and . Strogatz, Mining structural information on the Web Exploring complex networks Knowledge-based wrapper generation by using XML, IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, pp.8-131205, 1997.

M. Thelwallum05, ]. A. Ultsch, and F. Moerchen, ESOM-Maps : tools for clustering, visualization, and classification with Emergent SOM Prediction for phoneme/syllable/word-category and identification of language using hmm, UN90] Yoshio Ueda and Seiichi Nakagawa the 1990 International Conference on Spoken Language ProcessingVA00] J. Vesanto and E. Alhoniemi. Clustering of the Self-Organizing Map. In Student Member, pp.521157-1168, 1990.

M. Vargas-vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt et al., MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup, EKAW '02 : Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management . Ontologies and the Semantic Web, pp.379-391, 2002.
DOI : 10.1007/3-540-45810-7_34

]. R. Wei95, H. D. Weischedel, K. W. White, and . Mccain, Description of the PLUM system as used for MUC-6 Social Network Analysis : Methods and Applications, Proceedings of the Sixth Message Understanding Conference (MUC-6), pp.55-70119, 1989.

J. Duncan, S. Watts, and . Strogatz, Collective dynamics of 'small-world' networks, Nature, vol.393, pp.440-442, 1998.

K. [. Xu, H. Netter, and . Stenzhorn, MIETTA --- a framework for uniform and multilingual access to structured database and Web information, Proceedings of the fifth international workshop on on Information retrieval with Asian languages , IRAL '00, 2000.
DOI : 10.1145/355214.355220

R. Yangarber and R. Grishman, NYU : Description of the Proteus/PET system as used for MUC-7, Proceedings of the Seventh Message Understanding Conference, 1998.

]. D. Zie91 and . Ziegler, The automatic identification of languages using linguistic recognition signals, 1991.

[. Zreik and . Dang-tuan-nguyen, Catégorisation des hyperdocuments multilingues : système Hyperling, Conférence Internationale sur le Document Electronique (CiDE.8), 2005.

A. Marc, E. Zissman, and . Singer, Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling, the IEEE International Conference on Accoustics, Speech, and Signal Processing (ICA94), 1994.