On the Effect of Replication of Input Files on the Efficiency and the Robustness of a Set of Computations

Thomas Lambert 1, 2
2 Realopt - Reformulations based algorithms for Combinatorial Optimization
LaBRI - Laboratoire Bordelais de Recherche en Informatique, IMB - Institut de Mathématiques de Bordeaux, Inria Bordeaux - Sud-Ouest
Abstract : The increasing importance of High Performance Computing (HPC) and Big Data applications creates new issues in parallel computing. One of them is communication, the data transferred from a processor to another. Such data movements have an impact on computational time, inducing delays and increase of energy consumption. If replication, of either tasks or files, generates communication, it is also an important tool to improve resiliency and parallelism. In this thesis, we focus on the impact of the replication of input files on the overall amount of communication. For this purpose, we concentrate on two practical problems. The first one is parallel matrix multiplication. In this problem, the goal is to induce as few replications as possible in order to decrease the amount of communication. The second problem is the scheduling of the “Map” phase in the MapReduce framework. In this case, replication is an input of the problem and this time the goal is to use it in the best possible way. In addition to the replication issue, this thesis also considers the comparison between static and dynamic approaches for scheduling. For consistency, static approaches compute schedules before starting the computation while dynamic approaches compute the schedules during the computation itself. In this thesis we design hybrid strategies in order to take advantage of the pros of both. First, we relate communication-avoiding matrix multiplication with a square partitioning problem, where load-balancing is given as an input. In this problem, the goal is to split a square into zones (whose areas depend on the relative speed of resources) while minimizing the sum of their half-perimeters. We improve the existing results in the literature for this problem with two additional approximation algorithms. In addition we also propose an alternative model using a cube partitioning problem. We prove the NP-completeness of the associated decision problem and we design two approximations algorithms. Finally, we implement the algorithms for both problems in order to provide a comparison of the schedules for matrix multiplication. For this purpose, we rely on the StarPU library. Second, in the Map phase of MapReduce scheduling case, the input files are replicated and distributed among the processors. For this problem we propose two metrics. In the first one, we forbid non-local tasks (a task that is processed on a processor that does not own its input files) and under this constraint, we aim at minimizing the makespan. In the second problem, we allow non-local tasks and we aim at minimizing them while minimizing makespan. For the theoretical study, we focus on tasks with homogeneous computation times. First, we relate a greedy algorithm on the makespan metric with a “ball-into-bins” process, proving that this algorithm produces solutions with expected overhead (the difference between the number of tasks on the most loaded processor and the number of tasks in a perfect distribution) equal to O(mlogm) where m denotes the number of processors. Second, we relate this scheduling problem (with forbidden non-local tasks) to a problem of graph orientation and therefore prove, with the results from the literature, that there exists, with high probability, a near-perfect assignment (whose overhead is at most 1). In addition, there are polynomial-time optimal algorithms. For the communication metric case, we provide new algorithms based on a graph model close to matching problems in bipartite graphs. We prove that these algorithms are optimal for both communication and makespan metrics. Finally, we provide simulations based on traces from a MapReduce cluster to test our strategies with realistic settings and prove that the algorithms we propose perform very well in the case of low or medium variance of the computation times of the different tasks of a job.
Thomas Lambert. On the Effect of Replication of Input Files on the Efficiency and the Robustness of a Set of Computations. Other [cs.OH]. Université de Bordeaux, 2017. English. ⟨NNT : 2017BORD0656⟩. ⟨tel-01661588⟩



