Towards an Enhancement Effort Estimation Approach using Machine Learning Techniques

Zaineb Sakhrawi

Résumé

Estimating has often been seen as one of the biggest challenges in most software organizations. Several projects are ending late, out of budget, with less functionality than expected, and without any indication of their levels of quality. Considerations such as the use of inaccurate estimates strongly influence the success of software projects. This is because inaccurate estimates raise unrealistic expectations and contribute to customer dissatisfaction. Accurate estimates are suitable for making appropriate decisions at the right time. On the other hand, enhancement requests to add new requirements, improve existing requirements or change the usage of software products are a source of errors in these estimates. Therefore, they can increase the cost of software development or Enhancement (maintenance) projects, disrupt the project schedule, and even influence the quality of the final product. Many approaches with various estimation models are proposed to provide a more accurate effort estimation of software development and enhancement projects. There are three main categories of these models such as expert judgment, algorithmic models (e.g., COCOMO II), and non-algorithmic models (such as Machine Learning techniques). Several researchers agree on the effectiveness of the use of ML techniques compared to other estimation techniques. To resolve those problems listed above, we proposed the following contributions: — The first contribution consists in conducting a review on estimating the effort required to complete an enhancement in software projects based on “A Systematic Mapping Study – SMS in Software Engineering [1]”. The SMS was carried out by surveying relevant papers from 1995 to 2020 to determine the main factors used in evaluating ER and estimating the corresponding effort using ML techniques. The SMS selects 30 relevant studies. 19 published journals and 11 conference proceedings via four search engines (Google Scholar, IEEExplore, ACM Digital library, and ScienceDirect). This review supports researchers in identifying and structuring methods used in the field of effort estimation in software development and enhancement projects. The results of the SMS showed that there is a very little investigation on estimating the effort required to implement an enhancement in software enhancement projects. Most of the proposed approaches used ML techniques. — The second contribution consists in proposing a new approach for estimating the effort required to implement an enhancement in software requirements. This approach has two phases. The first phase consists in proposing an Ontology-based Model Classification (OMC) for classifying customer ER as either Functional Change or Technical Change. This study was conducted based on experimental results carried out on real projects from the software industry and on the PROMISE repository. The classification allows managers and stakeholders to be selective in the use of the FSM (Functional size measurement) method. Thus, we built a data set by associating each Enhancement Request (ER) with its corresponding effort using Expert judgment. The second phase deals with the prediction of Software enhancement effort (SEEE) using the dataset built in the first part. Four machine learning methods were selected to make the prediction: Ada Boost Regressor (ABR), Gradient Boosting Regressor (GBR), Linear support Vector Regression (Linear SVR), and Random Forest Regression (RFR). Results showed that the level of accuracy of the SEEE is improved when using the ontology with the RFR algorithm. — The third contribution consists in investigating the impact of an enhancement functional size through the use of IFPUG and COSMIC FSM methods on the accuracy of the SEEE. This contribution resulted in the effectiveness of the second generation COSMIC FSM method compared to the first generation IFPUG for sizing an enhancement and its use to make an enhancement estimation, and that of the resulting software product. — The fourth contribution consists in using the Correlated Feature Selection (CFS) algorithm to select the most relevant features using the ISBSG (International Software Benchmarking Standards Group) repository. The application of CFS has shown that there is a strong correlation between size and software enhancement effort. The M5P algorithm was used to provide the SEEE. The performance of this algorithm was compared against three ML regression techniques: Gradient Boosting Regressor (GBRegr), Linear support Vector Regression (LinearSVR), and Random Forest Regression (RFR). Results showed that the accuracy of SEEE was improved when using the CFS algorithm with the M5P algorithm. — The fifth contribution consists in proposing a new approach that investigates the use of the “Stacking Ensemble” model to increase the level of accuracy of SEEE. Our constructed Stacking Ensemble model combines three regression models: GBRegr, LinearSVR, and RFR. Compared to the approach based on using a single learning model (M5P), the Stacking Ensemble model gives more accurate results. — The sixth contribution consists in developing a Web application named "ERWebApp" to quickly make SEEE. The developed Web application is intended to first generate the functional size of an enhancement, then estimate the effort corresponding to this enhancement using the “Stacking Ensemble” model.

L'estimation a souvent été considérée comme l'un des plus grands défis pour la plupart des organisations de logiciels. Plusieurs projets se terminent tardivement, hors budget, avec moins de fonctionnalités que prévu et sans aucune indication sur leur niveau de qualité. Des considérations telles que l’utilisation d’estimations inexactes influencent fortement le succès des projets logiciels. En effet, des estimations inexactes suscitent des attentes irréalistes et contribuent au mécontentement des clients. Des estimations précises permettent de prendre des décisions appropriées au bon moment. En revanche, les demandes d'amélioration pour ajouter de nouvelles exigences, améliorer des exigences existantes ou modifier l'utilisation de produits logiciels sont une source d'erreurs dans ces estimations. Par conséquent, ils peuvent augmenter le coût des projets de développement logiciel ou d’amélioration (maintenance), perturber le calendrier du projet et même influencer la qualité du produit final. De nombreuses approches avec divers modèles d'estimation sont proposées pour fournir une estimation plus précise de l'effort des projets de développement et d'amélioration de logiciels. Il existe trois catégories principales de ces modèles : le jugement d'expert, les modèles algorithmiques (par exemple, COCOMO II) et les modèles non algorithmiques (tels que les techniques d'apprentissage automatique). Plusieurs chercheurs s'accordent sur l'efficacité de l'utilisation des techniques de ML par rapport à d'autres techniques d'estimation. Pour résoudre les problèmes énumérés ci-dessus, nous avons proposé les contributions suivantes : — La première contribution consiste à mener une revue sur l'estimation de l'effort requis pour réaliser une amélioration dans les projets logiciels basée sur « A Systematic Mapping Study – SMS in Software Engineering [1 ] ». Le SMS a été réalisé en examinant les articles pertinents de 1995 à 2020 pour déterminer les principaux facteurs utilisés dans l'évaluation du RE et estimer l'effort correspondant à l'aide de techniques de ML. Le SMS sélectionne 30 études pertinentes. 19 revues publiées et 11 actes de conférences via quatre moteurs de recherche (Google Scholar, IEEExplore, ACM Digital Library et ScienceDirect). Cette revue aide les chercheurs à identifier et à structurer les méthodes utilisées dans le domaine de l'estimation de l'effort dans les projets de développement et d'amélioration de logiciels. Les résultats du SMS ont montré qu'il existe très peu d'études sur l'estimation de l'effort requis pour mettre en œuvre une amélioration dans les projets d'amélioration de logiciels. La plupart des approches proposées utilisaient des techniques de ML. — La deuxième contribution consiste à proposer une nouvelle approche pour estimer l'effort nécessaire à la mise en œuvre d'une amélioration des exigences logicielles. Cette approche comporte deux phases. La première phase consiste à proposer une classification de modèles basée sur l'ontologie (OMC) pour classer les ER des clients en changement fonctionnel ou en changement technique. Cette étude a été menée à partir de résultats expérimentaux réalisés sur des projets réels de l'industrie du logiciel et sur le référentiel PROMISE. La classification permet aux gestionnaires et aux parties prenantes d'être sélectifs dans l'utilisation de la méthode FSM (Functional Size Measurement). Ainsi, nous avons construit un ensemble de données en associant chaque demande d'amélioration (ER) à son effort correspondant en utilisant le jugement d'experts. La deuxième phase concerne la prédiction de l'effort d'amélioration du logiciel (SEEE) à l'aide de l'ensemble de données construit dans la première partie. Quatre méthodes d'apprentissage automatique ont été sélectionnées pour effectuer la prédiction : Ada Boost Regressor (ABR), Gra dient Boosting Regressor (GBR), Linear support Vector Regression (Linear SVR) et Random Forest Regression (RFR). Les résultats ont montré que le niveau de précision du SEEE est amélioré lors de l’utilisation de l’ontologie avec l’algorithme RFR. — La troisième contribution consiste à étudier l'impact d'une amélioration de la taille fonctionnelle grâce à l'utilisation des méthodes IFPUG et COSMIC FSM sur la précision du SEEE. Cette contribution a abouti à l'efficacité de la méthode COSMIC FSM de deuxième génération par rapport à l'IFPUG de première génération pour dimensionner une amélioration et son utilisation pour faire une estimation d'amélioration, ainsi qu'à celle du produit logiciel résultant. — La quatrième contribution consiste à utiliser l'algorithme Corrated Feature Selection (CFS) pour sélectionner les fonctionnalités les plus pertinentes à l'aide du référentiel ISBSG (International Software Benchmarking Standards Group). L'application du CFS a montré qu'il existe une forte corrélation entre la taille et les efforts d'amélioration des logiciels. L'algorithme M5P a été utilisé pour fournir le SEEE. Les performances de cet algorithme ont été comparées à trois techniques de régression ML : Gradient Boosting Regressor (GBRegr), Linear support Vector Regression (LinearSVR) et Random Forest Regression (RFR). Les résultats ont montré que la précision de SEEE était améliorée lors de l’utilisation de l’algorithme CFS avec l’algorithme M5P. — La cinquième contribution consiste à proposer une nouvelle approche qui étudie l'utilisation du modèle « Stacking Ensemble » pour augmenter le niveau de précision du SEEE. Notre modèle Stacking Ensemble construit combine trois modèles de régression : GBRegr, LinearSVR et RFR. Par rapport à l’approche basée sur l’utilisation d’un modèle d’apprentissage unique (M5P), le modèle Stacking Ensemble donne des résultats plus précis. — La sixième contribution consiste à développer une application Web nommée « ERWe bApp » pour réaliser rapidement du SEEE. L'application Web développée a pour objectif de générer dans un premier temps la taille fonctionnelle d'une amélioration, puis d'estimer l'effort correspondant à cette amélioration à l'aide du modèle « Stacking Ensemble ».

Towards an Enhancement Effort Estimation Approach using Machine Learning Techniques

Vers une approche d’estimation de l’effort d’amélioration en utilisant des techniques d’apprentissage automatique

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager