A. Abstraction-des, 162 9.2.1 Motivations, p.163

S. Ajout and .. Ou-transposition-de-code, 196 11.1.2 Factorisation ou développement de fonctions

M. Chilowicz, É. Duris, and G. Roussel, Syntax tree fingerprinting for source code similarity detection, 2009 IEEE 17th International Conference on Program Comprehension, 2009.
DOI : 10.1109/ICPC.2009.5090050

URL : https://hal.archives-ouvertes.fr/hal-00627811

M. Chilowicz, É. Duris, and G. Roussel, Syntax tree fingerprinting for source code similarity detection, 2009 IEEE 17th International Conference on Program Comprehension, pp.243-247, 2009.
DOI : 10.1109/ICPC.2009.5090050

URL : https://hal.archives-ouvertes.fr/hal-00627811

M. Chilowicz, É. Duris, and G. Roussel, Finding similarities in source code through factorization Electronic Notes in Theoretical Computer Science, pp.47-62, 2009.

M. Chilowicz, É. Duris, and G. Roussel, Towards a multi-scale approach for source code approximate match report, Proceedings of the 4th International Workshop on Software Clones, IWSC '10, pp.89-90, 2010.
DOI : 10.1145/1808901.1808919

URL : https://hal.archives-ouvertes.fr/hal-00620368

]. O. Algorithmique-générale5, U. Berkman, and . Vishkin, Recursive star-tree parallel data structure, SIAM Journal on Computing, vol.22222211, issue.2, pp.221-242, 1993.

D. Comer, Ubiquitous B-Tree, ACM Computing Surveys, vol.11, issue.2, pp.121-137, 1979.
DOI : 10.1145/356770.356776

J. Earley, An efficient context-free parsing algorithm, Communications of the ACM, vol.13, issue.2, pp.94-102, 1970.
DOI : 10.1145/362007.362035

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Ester, H. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Knowledge Discovery and Data Mining, pp.226-231, 1996.

J. Fischer and V. Heun, A new succinct representation of RMQ-information and improvements in the enhanced suffix array. Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, pp.459-470, 2007.

E. R. Gansner, E. Koutsofios, S. C. North, and K. Vo, A technique for drawing directed graphs, IEEE Transactions on Software Engineering, vol.19, issue.3, pp.214-230, 1993.
DOI : 10.1109/32.221135

M. R. Garey and D. S. Johnson, Computers and Intractability ; A Guide to the Theory of NP-Completeness, 1990.

R. Hamming, Error Detecting and Error Correcting Codes, Bell System Technical Journal, vol.29, issue.2, pp.147-160, 1950.
DOI : 10.1002/j.1538-7305.1950.tb00463.x

M. Kowaluk and A. Lingas, LCA Queries in Directed Acyclic Graphs, Automata, Languages and Programming, pp.241-248, 2005.
DOI : 10.1007/11523468_20

V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals . Soviet Physics Doklady, pp.707-717, 1966.

M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 2008.

S. P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol.28, issue.2, pp.129-137, 1982.
DOI : 10.1109/TIT.1982.1056489

S. Mantaci, A. Restivo, G. Rosone, and M. Sciortino, An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression, Combinatorial Pattern Matching, pp.178-189, 2005.
DOI : 10.1007/11496656_16

H. G. Rice, Classes of recursively enumerable sets and their decision problems. Transactions of the, pp.358-366, 1953.

J. Sima and S. E. Schaeffer, On the NP-completeness of some graph cluster measures. SOFSEM 2006 : Theory and Practice of Computer Science, pp.530-537, 2006.

B. Stein and O. Niggemann, On the Nature of Structure and Its Identification, Proceedings of the 25th International Workshop on Graph-Theoretic Concepts in Computer Science, pp.122-134, 1999.
DOI : 10.1007/3-540-46784-X_13

R. Tarjan, Depth-first search and linear graph algorithms, SWAT '71 : Proceedings of the 12th Annual Symposium on Switching and Automata Theory, pp.114-121, 1971.
DOI : 10.1109/swat.1971.10

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Tomita, An efficient augmented-context-free parsing algorithm, Computational Linguistics, vol.13, pp.31-46, 1987.

D. H. Younger, Recognition and parsing of context-free languages in time n 3 . Information and Control, pp.189-208, 1967.

J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, vol.23, issue.3, pp.337-343, 1977.
DOI : 10.1109/TIT.1977.1055714

M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch, Replacing suffix trees with enhanced suffix arrays, Journal of Discrete Algorithms, vol.2, issue.1, 2004.
DOI : 10.1016/S1570-8667(03)00065-0

URL : http://doi.org/10.1016/s1570-8667(03)00065-0

D. R. Clark and J. I. Munro, Efficient suffix trees on secondary storage, Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, pp.383-391, 1996.

R. Clifford and M. Sergot, Distributed and Paged Suffix Trees for Large Genetic Databases, Proceedings of the 14th Annual Symposium Combinatorial on Pattern Matching, pp.70-82, 2003.
DOI : 10.1007/3-540-44888-8_6

M. Farach, Optimal suffix tree construction with large alphabets, Proceedings 38th Annual Symposium on Foundations of Computer Science, p.137, 1997.
DOI : 10.1109/SFCS.1997.646102

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Gallé, P. Peterlongo, and F. Coste, IN-PLACE UPDATE OF SUFFIX ARRAY WHILE RECODING WORDS, Proceedings of the Prague Stringology Conference, pp.54-67, 2008.
DOI : 10.1142/S0129054109007029

R. Giegerich and S. Kurtz, From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction, Algorithmica, vol.19, issue.3, pp.331-353, 1997.
DOI : 10.1007/PL00009177

J. Kärkkäinen and P. Sanders, Simple Linear Work Suffix Array Construction, Proceedings of the International Colloquium on Automata, Languages and Programming, pp.943-955, 2003.
DOI : 10.1007/3-540-45061-0_73

T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park, Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications, 12th Annual Symposium on Combinatorial Pattern Matching, pp.181-192, 2001.
DOI : 10.1007/3-540-48194-X_17

S. Kurtz, Reducing the space requirement of suffix trees Software : Practice and Experience, pp.1149-1171, 1999.

J. Kärkkäinen, Suffix cactus: A cross between suffix tree and suffix array, Combinatorial Pattern Matching, pp.191-204, 1995.
DOI : 10.1007/3-540-60044-2_43

U. Manber and G. Myers, Suffix Arrays: A New Method for On-Line String Searches, SIAM Journal on Computing, vol.22, issue.5, 1990.
DOI : 10.1137/0222058

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. Mccreight, A Space-Economical Suffix Tree Construction Algorithm, Journal of the ACM, vol.23, issue.2, pp.262-272, 1976.
DOI : 10.1145/321941.321946

S. J. Puglisi, W. F. Smyth, and A. H. Turpin, A taxonomy of suffix array construction algorithms, ACM Computing Surveys, vol.39, issue.2, 2007.
DOI : 10.1145/1242471.1242472

M. Salson, T. Lecroq, M. Léonard, and L. Mouchard, Dynamic extended suffix arrays, Journal of Discrete Algorithms, vol.8, issue.2, pp.241-257, 2010.
DOI : 10.1016/j.jda.2009.02.007

URL : https://hal.archives-ouvertes.fr/hal-00468910

K. Schürmann and J. Stoye, An incomplex algorithm for fast suffix array construction. Software : Practice and Experience, pp.309-329, 2007.

E. Ukkonen, On-line construction of suffix trees, Algorithmica, vol.10, issue.3, pp.249-260, 1995.
DOI : 10.1007/BF01206331

P. Weiner, Linear pattern matching algorithms, 14th Annual Symposium on Switching and Automata Theory (swat 1973), pp.1-11, 1973.
DOI : 10.1109/SWAT.1973.13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

G. Cobena, Gestion des changements pour les données semi-structurées du web, 2003.

E. D. Demaine, S. Mozes, B. Rossman, and O. Weimann, An optimal decomposition algorithm for tree edit distance, LNCS : Automata, Languages and Programming, pp.146-157, 2007.

D. S. Hirschberg, A linear space algorithm for computing maximal common subsequences, Communications of the ACM, vol.18, issue.6, pp.341-343, 1975.
DOI : 10.1145/360825.360861

P. N. Klein, Computing the Edit-Distance Between Unrooted Ordered Trees, Proceedings of the 6th Annual European Symposium on Algorithms, pp.91-102, 1998.
DOI : 10.1007/3-540-68530-8_8

S. Manavski and G. Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics, vol.9, issue.Suppl 2, p.10, 2008.
DOI : 10.1186/1471-2105-9-S2-S10

URL : http://doi.org/10.1186/1471-2105-9-s2-s10

T. F. Smith and M. S. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.1, pp.195-197, 1981.
DOI : 10.1016/0022-2836(81)90087-5

H. Touzet and S. Dulucq, Analysis of tree edit distance algorithms, Proceedings of the 14th annual symposium on Combinatorial Pattern Matching, pp.83-95, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00307516

M. Waterman and M. Eggert, A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons, Journal of Molecular Biology, vol.197, issue.4, pp.723-728, 1987.
DOI : 10.1016/0022-2836(87)90478-5

K. Zhang and D. Shasha, Simple Fast Algorithms for the Editing Distance between Trees and Related Problems, SIAM Journal on Computing, vol.18, issue.6, pp.1245-1262, 1989.
DOI : 10.1137/0218082

]. D. Hachage51, P. Eastlake, and . Jones, RFC 3174 : US secure hash algorithm 1 (SHA1), 2001.

A. Gionis, P. Indyk, and R. Motwani, Similarity search in high dimensions via hashing, Proceedings of the 25th International Conference on Very Large Data Bases, pp.518-529, 1999.

R. M. Karp and M. O. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, vol.31, issue.2, pp.249-260, 1987.
DOI : 10.1147/rd.312.0249

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Knuth, The Art of Computer Programming Sorting and Searching, Third Edition, 1997.

R. Rivest, RFC 1321 : The MD5 message-digest algorithm, 1992.

S. Schleimer, D. S. Wilkerson, and A. Aiken, Winnowing, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.76-85, 2003.
DOI : 10.1145/872757.872770

A. F. Webster and S. E. Tavares, On the Design of S-Boxes, Proceedings of the Advances in Cryptology conference, p.523, 1986.
DOI : 10.1007/3-540-39799-X_41

]. A. Ahtiainen, S. Surakka, and M. Rahikainen, Plaggie, Proceedings of the 6th Baltic Sea conference on Computing education research Koli Calling 2006, Baltic Sea '06, pp.141-142, 2006.
DOI : 10.1145/1315803.1315831

G. Antoniol, G. Casazza, M. D. Penta, and R. Fiutem, Object-oriented design patterns recovery, Journal of Systems and Software, vol.59, issue.2, pp.181-196, 2001.
DOI : 10.1016/S0164-1212(01)00061-9

M. Balazinska, E. Merlo, M. Dagenais, B. Lage, and K. Kontogiannis, Advanced cloneanalysis to support object-oriented system refactoring, p.98, 2000.
DOI : 10.1109/wcre.2000.891457

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. A. Basit and S. Jarzabek, A case for structural clones, Proceedings of the Third International Workshop on Software Clones, pp.7-11, 2009.

H. A. Basit and S. Jarzabek, A Data Mining Approach for Detecting Higher-Level Clones in Software, IEEE Transactions on Software Engineering, vol.35, issue.4, pp.497-514, 2009.
DOI : 10.1109/TSE.2009.16

I. D. Baxter, A. Yahin, L. Moura, M. Sant-'anna, and L. Bier, Clone detection using abstract syntax trees, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272), p.368, 1998.
DOI : 10.1109/ICSM.1998.738528

P. Bulychev and M. Minea, An evaluation of duplicate code detection using antiunification, Proceedings of the International Workshop on Software Clones, pp.22-27, 2009.

S. Ducasse, M. Rieger, and S. Demeyer, A language independent approach for detecting duplicated code, Proceedings IEEE International Conference on Software Maintenance, 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360)
DOI : 10.1109/ICSM.1999.792593

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

T. Imai, Y. Kataoka, and T. Fukaya, Evaluating software maintenance cost using functional redundancy metrics, Proceedings 26th Annual International Computer Software and Applications, pp.299-309, 2002.
DOI : 10.1109/CMPSAC.2002.1045018

R. Irving, Plagiarism and collusion detection using the Smith-Waterman algorithm, PAPERS, vol.7444, 2004.

P. Jablonski and D. Hou, CReN, Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange , eclipse '07, pp.16-20, 2007.
DOI : 10.1145/1328279.1328283

A. Jadalla and A. Elnagar, PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach, International Journal of Business Intelligence and Data Mining, vol.3, issue.2, pp.121-1358000, 2008.
DOI : 10.1504/IJBIDM.2008.020514

L. Jiang, G. Misherghi, Z. Su, and S. Glondu, DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones, 29th International Conference on Software Engineering (ICSE'07), pp.96-105, 2007.
DOI : 10.1109/ICSE.2007.30

T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, vol.28, issue.7, pp.654-670, 2002.
DOI : 10.1109/TSE.2002.1019480

R. Koschke, R. Falke, and P. Frenzel, Clone Detection Using Abstract Syntax Suffix Trees, 2006 13th Working Conference on Reverse Engineering, pp.253-262, 2006.
DOI : 10.1109/WCRE.2006.18

T. Lavoie, M. Eilers-smith, and E. Merlo, Challenging cloning related problems with GPU-based algorithms, Proceedings of the 4th International Workshop on Software Clones, IWSC '10, pp.25-32, 2010.
DOI : 10.1145/1808901.1808905

L. Moussiades and A. Vakali, PDetect : A clustering approach for detecting plagiarism in source code datasets. The Computer Journal, pp.651-661, 2005.

K. J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism, ACM SIGCSE Bulletin, vol.8, issue.4, pp.30-41, 1976.
DOI : 10.1145/382222.382462

L. P. Prechelt, U. Karlsruhe, and G. Malpohl, Finding plagiarisms among a set of programs with JPlag, Journal of Universal Computer Science, vol.8, pp.1016-1038, 2000.

R. Smith and S. Horwitz, Detecting and measuring similarity in code clones, Proceedings of the Third International Workshop on Software Clones, pp.28-34, 2009.

R. Tairas and J. Gray, Phoenix-based clone detection using suffix trees, Proceedings of the 44th annual southeast regional conference on , ACM-SE 44, pp.679-684, 2006.
DOI : 10.1145/1185448.1185597

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, On detection of gapped code clones using gap locations, Ninth Asia-Pacific Software Engineering Conference, 2002., p.327, 2002.
DOI : 10.1109/APSEC.2002.1183002

F. Van-rysselberghe and S. Demeyer, Reconstruction of successful software evolution using clone detection, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings., pp.126-130, 2003.
DOI : 10.1109/IWPSE.2003.1231219

R. Wettel and R. Marinescu, Archeology of code duplication: recovering duplication chains from small duplication fragments, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05), p.63, 2005.
DOI : 10.1109/SYNASC.2005.20

M. Wise, String similarity via greedy string tiling and running Karp-Rabin matching, 1993.

M. Wise, Neweyes : A system for comparing biological sequences using the Running Karp-Rabin Greedy String-Tiling algorithm, Third International Conference on Intelligent Systems for Molecular Biology, pp.393-401, 1995.

T. Yamamoto, M. Matsushita, T. Kamiya, and K. Inoue, Measuring Similarity of Large Software Systems Based on Source Code Correspondence, Product Focused Software Process Improvement, pp.530-544, 2005.
DOI : 10.1007/11497455_41

]. B. Zeidman, Software forensics tools enter the courtroom, IEEE Spectrum, issue.10, 2010.

S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, Comparison and Evaluation of Clone Detection Tools, Comparatifs de méthodes de recherche de similarité, pp.577-591, 2007.
DOI : 10.1109/TSE.2007.70725

C. K. Roy and J. R. Cordy, Scenario-Based Comparison of Clone Detection Techniques, 2008 16th IEEE International Conference on Program Comprehension, pp.153-162, 2008.
DOI : 10.1109/ICPC.2008.42

S. Schulze, S. Apel, and C. Kästner, Code clones in feature-oriented software product lines, In Generative Programming and Component Engineering, 2010.
DOI : 10.1145/1942788.1868310

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

. Outils-de-recherche-de-similarité, URL http://www.semdesigns.com/Products/Clone/. [92] Copy Paste Duplication (PMD) URL http, sourceforge.net/. [91] CloneDr

M. Bruntink, A. Van-deursen, T. Tourwe, and R. Van-engelen, An evaluation of clone detection techniques for identifying crosscutting concerns, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings., pp.200-209, 2004.
DOI : 10.1109/ICSM.2004.1357804

J. Harder and N. Göde, Modeling clone evolution, Proceedings of the Third International Workshop on Software Clones, pp.17-21, 2009.

C. Kapser, P. Anderson, M. Godfrey, R. Koschke, M. Rieger et al., Subjectivity in clone judgment : Can we ever agree ? In Duplication, Redundancy, and Similarity in Software, p.276, 2007.

A. Mockus, Large-Scale Code Reuse in Open Source Software, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007), 2007.
DOI : 10.1109/FLOSS.2007.10

C. K. Roy and J. R. Cordy, An Empirical Study of Function Clones in Open Source Software, 2008 15th Working Conference on Reverse Engineering, pp.81-90, 2008.
DOI : 10.1109/WCRE.2008.54

S. Uchida, T. Kamiya, A. Monden, K. Matsumoto, N. Ohsugi et al., Software analysis by code clones in Open Source programs, Journal of Computer Information Systems, XLV, issue.3, pp.1-11, 2005.

C. A. Villano, G. Casazza, G. Antoniol, U. Villano, E. Merlo et al., Identifying clones in the linux kernel, Proceedings of the First IEEE International Workshop on Source Code Analysis and Manipulation, pp.92-100, 2001.

J. Cervelle, R. Forax, and G. Roussel, Tatoo, Proceedings of the 4th international symposium on Principles and practice of programming in Java , PPPJ '06, pp.13-20, 2006.
DOI : 10.1145/1168054.1168057

URL : https://hal.archives-ouvertes.fr/hal-00620176

C. Collberg, C. Thomborson, and D. Low, A taxonomy of obfuscating transformations, 1997.

D. Low, Protecting Java code via code obfuscation, Crossroads, vol.4, issue.3, pp.21-23, 1998.
DOI : 10.1145/332084.332092

T. J. Mccabe, A Complexity Measure, Proceedings of the 2nd international conference on Software engineering, p.407, 1976.
DOI : 10.1109/TSE.1976.233837

N. A. Naeem, M. Batchelder, and L. Hendren, Metrics for Measuring the Effectiveness of Decompilers and Obfuscators, 15th IEEE International Conference on Program Comprehension (ICPC '07), pp.253-258, 2007.
DOI : 10.1109/ICPC.2007.27

T. J. Parr and R. W. Quong, Antlr : a predicated-ll(k) parser generator. Software Practice Experience, pp.789-810, 1995.

G. Wroblewski, General method of program code obfuscation, 2002.

. Générateur-d-'analyseur-syntaxique-cup, URL http://www2.cs.tum

C. Kapser and M. W. Godfrey, Improved tool support for the investigation of duplication in software, 21st IEEE International Conference on Software Maintenance (ICSM'05), pp.305-314, 2005.
DOI : 10.1109/ICSM.2005.52

M. Rieger, S. Ducasse, and M. Lanza, Insights into system-wide code duplication, 11th Working Conference on Reverse Engineering, pp.100-109, 2004.
DOI : 10.1109/WCRE.2004.25

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

I. J. Cox, S. Member, J. Kilian, F. T. Leighton, and T. Shamoon, Secure spread spectrum watermarking for multimedia, IEEE Transactions on Image Processing, vol.6, issue.12, pp.1673-1687, 1997.
DOI : 10.1109/83.650120

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. W. Dijkstra, Letters to the editor: go to statement considered harmful, Communications of the ACM, vol.11, issue.3, pp.147-148, 1968.
DOI : 10.1145/362929.362947

G. Frantzeskou, E. Stamatatos, S. Gritzalis, and S. Katsikas, Effective identification of source code authors using byte-level information, Proceeding of the 28th international conference on Software engineering , ICSE '06, pp.893-896, 2006.
DOI : 10.1145/1134285.1134445

G. Kacmarcik and M. Gamon, Obfuscating document stylometry to preserve author anonymity, Proceedings of the COLING/ACL on Main conference poster sessions -, pp.444-451, 2006.
DOI : 10.3115/1273073.1273131

URL : http://acl.ldc.upenn.edu/P/P06/P06-2058.pdf

I. Krsul and E. H. Spafford, Authorship analysis: identifying the author of a program, Computers & Security, vol.16, issue.3, pp.233-257, 1997.
DOI : 10.1016/S0167-4048(97)00005-9

K. Monostori, A. Zaslavsky, and H. Schmidt, MatchDetectReveal : finding overlapping and similar digital documents, Proceedings of the International Conference on Challenges of Information Technology Management in the 21st century, pp.955-957, 2000.

M. D. Swanson, B. Zhu, A. H. Tewfik, and L. Boney, Robust audio watermarking using perceptual masking. Signal Process, pp.337-355, 1998.
DOI : 10.1016/s0165-1684(98)00014-0