Title
Extended mpiJava for distributed checkpointing and recovery
Date Issued
01 January 2006
Access level
metadata only access
Resource Type
conference paper
Author(s)
Hernández E.
Pereira W.
Publisher(s)
Springer Verlag
Abstract
In this paper we describe an mpiJava extension that implements a parallel checkpointing/recovery service. This checkpointing/recovery facility is transparent to applications, i.e. no instrumentation is needed. We use a distributed approach for taking the checkpoints, which means that the processes take their local checkpoints independently. This approach reduces communication between processes and there is not need for a central server for checkpoint storage. We present some experiments which suggest that the benefits of this extended MPI functionality do not have a significant performance penalty as a side effect, apart from the well-known penalties related to the local checkpoint generation. © Springer-Verlag Berlin Heidelberg 2006.
Start page
158
End page
165
Volume
4192 LNCS
Language
English
OCDE Knowledge area
Ingeniería de sistemas y comunicaciones Ciencias de la computación
Scopus EID
2-s2.0-33750266875
ISSN of the container
03029743
ISBN of the container
9783540391104
Conference
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): 13th European PVM/MPI User's Group Meeting
Sources of information: Directorio de Producción Científica Scopus