Title
Scanflow: An end-to-end agent-based autonomic ML workflow manager for clusters
Date Issued
06 December 2021
Access level
open access
Resource Type
conference paper
Author(s)
Universitat Politècnica de Catalunya
Publisher(s)
Association for Computing Machinery, Inc.
Abstract
Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.
Start page
1
End page
2
Language
English
OCDE Knowledge area
Sistemas de automatización, Sistemas de control
Subjects
Publication version
Version of Record
Scopus EID
2-s2.0-85121465594
Resource of which it is part
Middleware 2021 Demos and Posters - Proceedings of the 2021 International Middleware Conference Demos and Posters
ISBN of the container
978-145039154-2
Conference
22nd International Middleware Conference, Middleware 2021
Sponsor(s)
This work was partially supported by Lenovo as part of Lenovo-BSC 2020 collaboration agreement, by the Spanish Government under contract PID2019-107255GB-C22, and by the Generalitat de Catalunya under contract 2017-SGR-1414 and under grant 2020 FI-B 00257.
Sources of information:
Directorio de Producción Científica
Scopus