Abstract
Concerns of robustness, reliability, resilience, and elasticity in Machine Learning (ML) systems are important, and they must be considered in trade-off with efficiency factors. However, they need to be supported and optimized in an end-to-end manner, not just for ML models. In this chapter we present a conceptual approach to architectural design and engineering of the robustness, reliability, resilience, and elasticity (R3E) for end-to-end big data ML systems at runtime. We propose quality of analytics as a contractual means for optimizing end-to-end big data machine learning (BDML) systems. Based on that, we propose to define and abstract diverse types of components under R3E objects and devise operations and metrics for managing R3E attributes. Through a set of proposed coordination, monitoring, analytics, and testing methods, we identify essential tasks for tackling R3E concerns when developing BDML systems. Finally, we illustrate our approach with an example of an end-to-end BDML system for building objects classifications.
Original language | English |
---|---|
Title of host publication | AI Assurance: Towards Trustworthy, Explainable, Safe, and Ethical AI |
Editors | Feras Batarseh, Laura Freeman |
Publisher | Elsevier |
Pages | 339-367 |
ISBN (Electronic) | 978-0-323-91882-4 |
ISBN (Print) | 978-0-323-91919-7 |
DOIs | |
Publication status | Published - 2022 |
MoE publication type | A3 Book section, Chapters in research books |