Managing large-scale, distributed systems research experiments with control-flows

Tomasz Buchert 1
1 MADYNES - Management of dynamic networks and services
Inria Nancy - Grand Est, LORIA - NSS - Department of Networks, Systems and Services
Abstract : Running experiments on modern systems such as supercomputers, cloud infrastructures or P2P networks became very complex, both technically and methodologically. It proved difficult to run experiments correctly and understand obtained results, even with the background on the employed technology and methods. Moreover, large-scale experiments suffer from erroneous and the unpredictable behavior of underlying software and hardware, undermining the scientific principles of experimental computer science. This worrisome state of research on large-scale distributed systems calls for new approaches to design, run and interpret experiments. This work explores the use of control-flows (business processes) as a model for representing the large-scale experiments in research on distributed systems. We set out to find advantages, disadvantages and limitations of this approach, and practical considerations for future implementers. We make 3 main contributions. First, we analyze the current state of experiment management tools, their limits and features to better understand difficulties that lay ahead. We construct a general framework to evaluate tools of this type. Second, we design and implement an experiment management tool which is based on the model of control-flows. We show that this methodology can be implemented and used in practice to run challenging and large-scale experiments while offering a wide set of features, some of them missing in the previous approaches. Finally, we analyze the use of provenance in computer science, and in particular in experimental research on distributed systems, and propose a provenance collection system that emerges from the control-flow model used as the representation of experiments. The design is implemented and shown to collect provenance in efficient and automatic way. Our results show that workflows are a viable model for the design and execution of experiments in distributed systems research. With these positive conclusions in mind, we also sketch future research directions for improving our work.
Complete list of metadatas

Cited literature [198 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01273964
Contributor : Tomasz Buchert <>
Submitted on : Monday, February 15, 2016 - 2:52:08 AM
Last modification on : Tuesday, February 5, 2019 - 2:46:01 PM
Long-term archiving on : Saturday, November 12, 2016 - 8:10:17 PM

File

Identifiers

  • HAL Id : tel-01273964, version 1

Citation

Tomasz Buchert. Managing large-scale, distributed systems research experiments with control-flows. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lorraine, 2016. English. ⟨tel-01273964⟩

Share

Metrics

Record views

555

Files downloads

1063