MULTI FAILURE ANALYSIS IN PARALLEL PROCESSING USING GIFT TOOL |
Author(s): |
KR.Senthil Murugan |
Keywords: |
Fault tolerance, fault-tolerant parallel algorithm, fast self-recovery, parallel recomputing. |
Abstract |
As the size of large-scale computer systems increases, their mean-time-between-failures are significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Conventional rollback-recovery protocols redo the computation of the crashed process since the last checkpoint on a single processor. As a result, the recovery time of all protocols is no less than the time between the last checkpoint and the crash. In this paper, we propose a new application-level fault-tolerant approach for parallel applications called the Fault-Tolerant Parallel Algorithm (FTPA), which provides fast self-recovery. When fail-stop failures occur and are detected, all surviving processes recompute the workload of failed processes in parallel. FTPA requires the user to be involved in fault tolerance. Get it Fault-Tolerant (GiFT), a source-to-source precompiler tool to automate the FTPA implementation. The experimental results show that the performance of FTPA is better than the performance of the traditional check pointing approach. |
Other Details |
Paper ID: IJSARTV Published in: Volume : 3, Issue : 7 Publication Date: 7/2/2017 |
Article Preview |
Download Article |