M. Ahasanuzzaman, M. Asaduzzaman, C. K. Roy, and K. A. Schneider, Mining duplicate questions in stack overflow, Proceedings of the 13th International Conference on Mining Software Repositories, p.27, 2016.

M. H. Alalfi, E. P. Antony, and J. R. Cordy, An approach to clone detection in sequence diagrams and its application to security analysis, Software & Systems Modeling, vol.17, issue.4, pp.1287-1309, 2018.

G. Alkhatib, The maintenance problem of application software: An empirical analysis, Journal of Software Maintenance: Research and Practice, vol.4, issue.2, pp.83-104, 1992.

E. P. Antony, M. H. Alalfi, and J. R. Cordy, An approach to clone detection in behavioural models, 20th Working Conference on Reverse Engineering (WCRE), p.23, 2013.

B. S. Baker, A program for identifying duplicated code, Computing Science and Statistics, pp.49-49, 1993.

B. S. Baker, On finding duplication and near-duplication in large software systems, Proceedings of 2nd Working Conference on Reverse Engineering, p.87, 1995.

M. Balazinska, E. Merlo, M. Dagenais, B. Lague, and K. Kontogiannis, Measuring clone based reengineering opportunities, Proceedings Sixth International Software Metrics Symposium (Cat. No. PR00403), p.10, 1999.

I. D. Baxter, A. Yahin, L. Moura, M. Sant'anna, and L. Bier, Clone detection using abstract syntax trees, Proceedings., International Conference on, vol.10, p.88, 1998.

S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, Comparison and evaluation of clone detection tools. Software Engineering, IEEE Transactions on, vol.33, issue.9, pp.577-591, 2007.

S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, Comparison and evaluation of clone detection tools, IEEE Transactions on software engineering, vol.33, issue.9, pp.577-591, 2007.

L. Bratthall and M. Jørgensen, Can you trust a single data source exploratory software engineering case study?, Empirical Software Engineering, vol.7, pp.9-26, 2002.

E. Burd and J. Bailey, Evaluating clone detection tools for use during preventative maintenance, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation, p.10, 2002.

A. Charpentier, Contributions à l'usage des détecteurs de clones pour des tâches de maintenance logicielle, p.85, 2016.

A. Charpentier, J. Falleri, D. Lo, R. , and L. , An empirical assessment of bellon's clone benchmark, Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, pp.1-10, 2015.

A. Charpentier, J. Falleri, R. , and L. , Automated extraction of mixins in cascading style sheets, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), p.26, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02182065

J. R. Cordy, Comprehending reality-practical barriers to industrial adoption of software maintenance automation, 11th IEEE International Workshop on Program Comprehension, p.87, 2003.

J. R. Cordy and C. K. Roy, The nicad clone detector, 2011 IEEE 19th International Conference on Program Comprehension, p.23, 2011.

F. F. Correia, A. Aguiar, H. S. Ferreira, and N. Flores, Patterns for consistent software documentation, Proceedings of the 16th Conference on Pattern Languages of Programs, p.12, 2009.

B. Dagenais and M. P. Robillard, Creating and evolving developer documentation: understanding the decisions of open source contributors, Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, pp.127-136, 2010.

N. Davey, P. Barson, S. Field, R. Frank, and D. Tansley, The development of a software clone detector, International Journal of Applied Software Technology, 1995.

S. C. De-souza, N. Anquetil, and K. M. Oliveira, A Study of the Documentation Essential to Software Maintenance, Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting &Amp; Designing for Pervasive Information, SIGDOC '05, pp.68-75, 2005.

F. Deissenboeck, B. Hummel, E. Jürgens, B. Schätz, S. Wagner et al., Clone detection in automotive model-based development, Proceedings of the 30th international conference on Software engineering, p.101, 2008.

F. Deissenboeck, E. Juergens, B. Hummel, S. Wagner, B. M. Parareda et al., Tool support for continuous quality control, IEEE software, vol.25, issue.5, pp.60-67, 2008.

G. A. Di-lucca, M. Di-penta, A. R. Fasolino, and P. Granato, Clone analysis in the web era: An approach to identify cloned web pages, Proceedings of the 7th IEEE Workshop on Empirical Studies of Software Maintenance (WESS'99), pp.19-26, 2001.

C. Domann, E. Juergens, and J. Streit, The curse of Copy&Paste cloning in requirements specifications, Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pp.443-446, 2009.

E. Duala-ekoko and M. P. Robillard, Tracking code clones in evolving software, 29th International Conference on Software Engineering (ICSE'07), p.29, 2007.

E. Duala-ekoko and M. P. Robillard, Clonetracker: tool support for code clone management, Proceedings of the 30th international conference on Software engineering, p.29, 2008.

S. Ducasse, M. Rieger, and S. Demeyer, A language independent approach for detecting duplicated code, Software Maintenance, 1999.(ICSM'99) Proceedings. IEEE International Conference on, p.15, 1999.

J. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, Fine-grained and Accurate Source Code Differencing, Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE '14, pp.313-324, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01054552

B. Fluri, M. Würsch, and H. C. Gall, Do code and comments co-evolve? on the relation between source code and comment changes, 14th Working Conference on, p.34, 2007.

A. Forward and T. C. Lethbridge, The Relevance of Software Documentation, Tools and Technologies: A Survey, Proceedings of the 2002 ACM Symposium on Document Engineering, DocEng '02, pp.26-33, 2002.

M. Gabel, L. Jiang, and Z. Su, Scalable detection of semantic clones, Proceedings of the 30th international conference on Software engineering, pp.321-330, 2008.

S. Giesecke, Generic modelling of code clones, Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum fr Informatik, 2007.

D. Gitchell and N. Tran, Sim: a utility for detecting similarity in computer programs, ACM SIGCSE Bulletin, vol.31, pp.266-270, 1999.

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, 2011.

F. Hermans, B. Sedee, M. Pinzger, and A. Van-deursen, Data clone detection and visualization in spreadsheets, 35th International Conference on Software Engineering (ICSE), p.27, 2013.

B. Hummel, E. Juergens, L. Heinemann, and M. Conradt, Index-based code clone detection: incremental, distributed, scalable, Software Maintenance (ICSM), 2010.

, IEEE International Conference on, vol.21, p.63

P. Jablonski and D. Hou, Cren: a tool for tracking copy-and-paste code clones and renaming identifiers consistently in the ide, Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, pp.16-20, 2007.

L. Jiang, G. Misherghi, Z. Su, and S. Glondu, Deckard: Scalable and accurate treebased detection of code clones, Proceedings of the 29th international conference on Software Engineering, pp.96-105, 2007.

J. H. Johnson, Identifying redundancy in source code using fingerprints, Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering, vol.1, pp.171-183, 1993.

E. Juergens, F. Deissenboeck, M. Feilkas, B. Hummel, B. Schaetz et al., Can clone detection support quality assessments of requirements specifications?, Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol.2, pp.79-88, 2010.

E. Juergens, F. Deissenboeck, and B. Hummel, Clonedetective-a workbench for clone detection research, Proceedings of the 31st International Conference on Software Engineering, pp.603-606, 2009.

E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, Do code clones matter?, 2009 IEEE 31st International Conference on Software Engineering, pp.485-495, 2009.

E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, Do Code Clones Matter?, Proceedings of the 31st International Conference on Software Engineering, ICSE '09, pp.485-495, 2009.

T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, vol.28, issue.7, pp.654-670, 2002.

C. Kapser, Toward an understanding of software code cloning as a development practice, p.87, 2009.

C. Kapser and M. W. Godfrey, Cloning considered harmful" considered harmful, Reverse Engineering, 2006. WCRE'06. 13th Working Conference on, pp.19-28, 2006.

R. M. Karp and M. O. Rabin, Efficient randomized pattern-matching algorithms, IBM journal of research and development, vol.31, issue.2, pp.249-260, 1987.

S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei et al., Shinobi: A tool for automatic code clone detection in the ide, 16th Working Conference on Reverse Engineering, p.29, 2009.

R. Komondoor and S. Horwitz, Using slicing to identify duplication in source code, International Static Analysis Symposium, pp.40-56, 2001.

K. A. Kontogiannis, R. Demori, E. Merlo, M. Galler, and M. Bernstein, Pattern matching for clone and concept detection, Automated Software Engineering, vol.3, issue.1-2, pp.77-108, 1996.

R. Koschke, R. Falke, and P. Frenzel, Clone detection using abstract syntax suffix trees, 13th Working Conference on Reverse Engineering, p.20, 2006.

D. Kramer, API documentation from source code comments: a case study of Javadoc, Proceedings of the 17th annual international conference on Computer documentation, pp.147-153, 1999.

J. Krinke, Identifying similar code with program dependence graphs, Proceedings Eighth Working Conference on Reverse Engineering, p.20, 2001.

B. Lague, D. Proulx, J. Mayrand, E. M. Merlo, and J. Hudepohl, Assessing the benefits of incorporating function clone detection in a development process, Proceedings International Conference on Software Maintenance, p.28, 1997.

A. Lakhotia, Understanding Someone else's Code: Analysis of Experiences, J. Syst. Softw, vol.23, issue.3, pp.269-275, 1993.

F. Lanubile and T. Mallardo, Finding function clones in web applications, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings, p.19, 2003.

T. C. Lethbridge, J. Singer, and A. Forward, How Software Engineers Use Documentation: The State of the Practice, IEEE Softw, vol.20, issue.6, pp.35-39, 2003.

A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot, and H. Pfister, UpSet: visualization of intersecting sets, IEEE transactions on visualization and computer graphics, vol.20, issue.12, pp.1983-1992, 2014.

L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, Cclearner: A deep learningbased clone detection approach, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), p.21, 2017.

Z. Li, S. Lu, S. Myagmar, and Y. Zhou, Cp-miner: Finding copy-paste and related bugs in large-scale software code, IEEE Transactions on software Engineering, vol.32, issue.3, pp.176-192, 2006.

Z. Li, G. Yin, Y. Yu, T. Wang, W. et al., Detecting duplicate pull-requests in github, Proceedings of the 9th Asia-Pacific Symposium on Internetware, p.26, 2017.

C. Liu, C. Chen, J. Han, Y. , and P. S. , Gplag: detection of software plagiarism by program dependence graph analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.872-881, 2006.

H. Liu, Z. Ma, L. Zhang, and W. Shao, Detecting duplications in sequence diagrams based on suffix trees, Software Engineering Conference, p.13, 2006.

A. Pacific, , p.88

D. Martin and J. R. Cordy, Analyzing web service similarity using contextual clones, Proceedings of the 5th International Workshop on Software Clones, pp.41-46, 2011.

J. Mayrand, Evaluating the benefits of clone detection in the software maintenance activities in large scale systems. WESS'96, 1996.

J. Mayrand, C. Leblanc, and E. Merlo, Experiment on the automatic detection of function clones in a software system using metrics, icsm, vol.96, p.19, 1996.

S. Mcintosh, M. Poehlmann, E. Juergens, A. Mockus, B. Adams et al., Collecting and leveraging a benchmark of build system clones to aid in quality assessments, Companion proceedings of the 36th international conference on software engineering, pp.145-154, 2014.

J. Miller, Triangulation as a basis for knowledge discovery in software engineering, Empirical Software Engineering, vol.13, issue.2, pp.223-228, 2008.

M. Monperrus, M. Eichberg, E. Tekes, and M. Mezini, What Should Developers Be Aware Of? An Empirical Study on the Directives of API Documentation, Empirical Software Engineering, vol.17, issue.6, pp.703-737, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00702183

S. Nejati, M. Sabetzadeh, M. Chechik, S. Easterbrook, and P. Zave, Matching and merging of statecharts specifications, Proceedings of the 29th international conference on Software Engineering, pp.54-64, 2007.

H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. Al-kofahi, and T. N. Nguyen, Clone management for evolving software, IEEE transactions on software engineering, vol.38, issue.5, pp.1008-1026, 2011.

M. A. Oumaziz, A. Charpentier, J. Falleri, and X. Blanc, Documentation reuse: Hot or not? An empirical study, International Conference on Software Reuse, pp.12-27, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02182142

M. A. Oumaziz, J. Falleri, X. Blanc, T. F. Bissyandé, and J. Klein, Handling duplicates in dockerfiles families: Learning from experts, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), p.83, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02485839

D. L. Parnas, A Technique for Software Module Specification with Examples. Commun, vol.15, pp.330-336, 1972.

T. J. Parr, Enforcing Strict Model-view Separation in Template Engines, Proceedings of the 13th International Conference on World Wide Web, WWW '04, pp.224-233, 2004.

J. R. Pérez-agüera, J. Arroyo, J. Greenberg, J. P. Iglesias, and V. Fresno, Using bm25f for semantic search, Proceedings of the 3rd international semantic search workshop, p.26, 2010.

M. Pollack, Code generation using javadoc, p.35, 2000.

D. C. Rajapakse and S. Jarzabek, Using server pages to unify clones in web applications: A trade-off analysis, 29th International Conference on Software Engineering (ICSE'07), p.25, 2007.

A. Raza, G. Vogel, and E. Plödereder, Bauhaus-a tool suite for program analysis and reverse engineering, International Conference on Reliable Software Technologies, pp.71-82, 2006.

M. Rieger, Effective clone detection without language barriers, 2005.

M. Rieger, S. Ducasse, and M. Lanza, Insights into system-wide code duplication, 11th Working Conference on Reverse Engineering, p.87, 2004.

C. K. Roy and J. R. Cordy, A survey on software clone detection research. Queen's School of Computing TR, vol.541, pp.64-68, 2007.

C. K. Roy, J. R. Cordy, and R. Koschke, Comparison and evaluation of code clone detection techniques and tools: A qualitative approach, Science of computer programming, vol.74, issue.7, pp.470-495, 2009.

C. K. Roy, M. F. Zibran, and R. Koschke, The vision of software clone management: Past, present, and future (keynote paper), 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, p.28, 2014.

A. Saebjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, Detecting code clones in binary executables, Proceedings of the eighteenth international symposium on Software testing and analysis, pp.117-128, 2009.

H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes, Sourcerercc: Scaling code clone detection to big-code, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), p.17, 2016.

C. B. Seaman, Qualitative methods in empirical studies of software engineering, IEEE Transactions on software engineering, vol.25, issue.4, pp.557-572, 1999.

T. Sharma, M. Fragkoulis, and D. Spinellis, Does your configuration code smell?, Mining Software Repositories (MSR), 2016 IEEE/ACM 13th Working Conference on, p.25, 2016.

H. Störrle, Towards clone detection in UML domain models, Software & Systems Modeling, vol.12, issue.2, pp.307-329, 2013.

C. Sun, D. Lo, S. Khoo, and J. Jiang, Towards more accurate retrieval of duplicate bug reports, Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp.253-262, 2011.

C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, A discriminative model approach for accurate duplicate bug report retrieval, Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol.1, pp.45-54, 2010.

A. Sureka and P. Jalote, Detecting duplicate bug report using character n-grambased features, 2010 Asia Pacific Software Engineering Conference, p.26, 2010.

E. B. Swanson, The dimensions of maintenance, Proceedings of the 2nd international conference on Software engineering, pp.492-497, 1976.

N. Synytskyy, J. R. Cordy, and T. Dean, Resolution of static clones in dynamic web pages, Fifth IEEE International Workshop on Web Site Evolution, p.29, 2003.

R. Tairas and J. Gray, Phoenix-based clone detection using suffix trees, Proceedings of the 44th annual Southeast regional conference, pp.679-684, 2006.

M. Tatsubori and T. Suzumura, HTML Templates That Fly: A Template Engine Approach to Automated Offloading from Server to Client, Proceedings of the 18th International Conference on World Wide Web, WWW '09, pp.951-960, 2009.

D. Van-heesch, , p.35, 2004.

M. L. Vanter, The documentary structure of source code. Information and Software Technology, vol.44, pp.767-782, 2002.

V. Wahler, D. Seipel, J. Wolff, and G. Fischer, Clone detection in source code by frequent itemset techniques. In Source Code Analysis and Manipulation, Fourth IEEE International Workshop on, p.18, 2004.

M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, Deep learning code fragments for code clone detection, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp.87-98, 2016.

M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, Deep learning code fragments for code clone detection, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp.87-98, 2016.

C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell et al., Experimentation in software engineering, pp.52-77, 2012.

M. Wood, J. Daly, J. Miller, and M. Roper, Multi-method research: An empirical investigation of object-oriented technology, Journal of Systems and Software, vol.48, issue.1, pp.13-26, 1999.

X. Yan, J. Han, A. , and R. , Clospan: Mining: Closed sequential patterns in large datasets, Proceedings of the 2003 SIAM international conference on data mining, p.17, 2003.

W. Yang, Identifying syntactic differences between two programs. Software: Practice and Experience, vol.21, pp.739-755, 1991.

G. Zhang, X. Peng, Z. Xing, S. Jiang, H. Wang et al., Towards contextual and on-demand code clone management by continuous monitoring, 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), p.28, 2013.

J. Zhou and H. Zhang, Learning to rank duplicate bug reports, Proceedings of the 21st ACM international conference on Information and knowledge management, p.26, 2012.

M. F. Zibran, R. K. Saha, M. Asaduzzaman, and C. K. Roy, Analyzing and forecasting near-miss clones in evolving software: An empirical study, 16th IEEE International Conference on Engineering of Complex Computer Systems, p.87, 2011.

, Extract of a documentation duplication due to method delegation (in the Apache Commons Collection project)

.. .. I-clone,

, Example of code transformation

. Deissenboeck, Example of a PI-controller model gathered from, p.24, 2008.

, The generated documentation by Yard for the from_secret_key method from the RbNaCL project

, Violin plot for the number of classes of each project in our for corpus (for both Java and Ruby)

, Right: Violin plot for the percentage of documented methods in every project in our corpus, Left: Violin plot for the number of methods in every project in our corpus

, Extract of a documentation duplication from the Apache Commons IO project. The duplicated tag is highlighted in red

, Violin plot for the percetange of duplicated tags per project (for both Java and Ruby)

, Left: Violin plot for the number of methods sharing a common tag in Java. Right: Violin plot for the number of methods sharing a common tag in Ruby, p.44

, Upper-right: Violin plot for the number of duplicate @params tags per project. Lower-left: Violin plot for the number of duplicate @return tags per project. Lower-right: Violin plot for the number of duplicate @throws (@raise for ruby) tags per project, Upper-left: Violin plot for the number of duplicate @description tags per project

, Extract of duplicate due to a delegation between two methods in the Ruby/Git library project. Duplicated tags are displayed in red

, 47 3.10 Example of duplicate due to sub-typing in the Apache Commons Collections project. Duplicated tags are displayed in red

, Example of duplicate due to code clone in the Apache Commons IO project. Duplicated tags are displayed in red

, Extract of duplicate due to a similar use between two methods in the Ruby/Git library project. Duplicated tags are displayed in red

, The stack of layers built from the Dockerfile with the corresponding final image size

, 61 4.3 RUN instruction with multiple shell commands split into two RUN instructions, one for each shell command

, Dockerfile presenting an example of duplicate index with chunk size set to 6, vol.63

, Extract of real Dockerfile duplicate from Bash shell v3

, Extract of real Dockerfile duplicate from Bash shell v4

, UpSet plot showing the relationships between versions, flavours, base images and platforms across our repositories

, Upper-right plot: Violin plot for the number of instructions per project. Bottom plot: Violin plot for the number of instructions by duplicate, p.67

, Stripplot of the number of owners of every duplicate in our corpus, p.68

, Violin plot for the percentage of co-evolving commits per project, p.69

, Left plot: Violin plot for the percentage of duplicate instructions in Dockerfiles of a project using Templates. Right plot: Violin plot for the percentage of duplicates reduction in projects using templates