Skip to Main content Skip to Navigation

On the identification of performance bottlenecks in multi-tier distributed systems

Abstract : Today's distributed systems are made of various software componentswith complex interactions and a large number of configurationsettings. Pinpointing the performance bottlenecks is generally a non-trivial task, which requires human expertise as well as trial anderror. Moreover, the same software stack may exhibit very differentbottlenecks depending on factors such as the underlying hardware, theapplication logic, the configuration settings, and the operatingconditions. This work aims to (i) investigate whether it is possibleto identify a set of key metrics that can be used as reliable andgeneral indicators of performance bottlenecks, (ii) identify thecharacteristics of these indicators, and (iii) build a tool that canautomatically and accurately determine if the system reaches itsmaximum capacity in terms of throughput.In this thesis, we present three contributions. First, we present ananalytical study of a large number of realistic configuration setupsof multi-tier distributed applications, more specifically focusing ondata processing pipelines. By analyzing a large number of metrics atthe hardware and at the software level, we identify the ones thatexhibit changes in their behavior at the point where the systemreaches its maximum capacity. We consider these metrics as reliableindicators of performance bottlenecks. Second, we leverage machinelearning techniques to build a tool that can automatically identifyperformance bottlenecks in the data processing pipeline. We considerdifferent machine learning methods, different selections of metrics,and different cases of generalization to new setups. Third, to assessthe validity of the results obtained considering the data processingpipeline for both the analytical and the learning-based approaches,the two approaches are applied to the case of a Web stack.From our research, we draw several conclusions. First, it is possibleto identify key metrics that act as reliable indicators of performancebottlenecks for a multi-tier distributed system. More precisely,identifying when the server has reached its maximum capacity can beidentified based on these reliable metrics. Contrary to the approachadopted by many existing works, our results show that a combination ofmetrics of different types is required to ensure reliableidentification of performance bottlenecks in a large number ofsetups. We also show that approaches based on machine learningtechniques to analyze metrics can identify performance bottlenecks ina multi-tier distributed system. The comparison of different modelsshows that the ones based on the reliable metrics identified by ouranalytical study are the ones that achieve the bestaccuracy. Furthermore, our extensive analysis shows the robustness ofthe obtained models that can generalize to new setups, to new numbersof clients, and to both new setups and new numbers ofclients. Extending the analysis to a Web stack confirmsthe main findings obtained through the study of the data processingpipeline. These results pave the way towards a general and accuratetool to identify performance bottlenecks in distributed systems.
Document type :
Complete list of metadata

Cited literature [152 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, November 3, 2020 - 9:47:35 AM
Last modification on : Wednesday, December 2, 2020 - 5:43:12 PM
Long-term archiving on: : Thursday, February 4, 2021 - 6:12:06 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02986536, version 1



Maha Alsayasneh. On the identification of performance bottlenecks in multi-tier distributed systems. Hardware Architecture [cs.AR]. Université Grenoble Alpes [2020-..], 2020. English. ⟨NNT : 2020GRALM009⟩. ⟨tel-02986536⟩



Record views


Files downloads