Coordination-aware assurance for end-to-end machine learning systems: the R3E approach

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

32 Downloads (Pure)

Abstract

Concerns of robustness, reliability, resilience, and elasticity in Machine Learning (ML) systems are important, and they must be considered in trade-off with efficiency factors. However, they need to be supported and optimized in an end-to-end manner, not just for ML models. In this chapter we present a conceptual approach to architectural design and engineering of the robustness, reliability, resilience, and elasticity (R3E) for end-to-end big data ML systems at runtime. We propose quality of analytics as a contractual means for optimizing end-to-end big data machine learning (BDML) systems. Based on that, we propose to define and abstract diverse types of components under R3E objects and devise operations and metrics for managing R3E attributes. Through a set of proposed coordination, monitoring, analytics, and testing methods, we identify essential tasks for tackling R3E concerns when developing BDML systems. Finally, we illustrate our approach with an example of an end-to-end BDML system for building objects classifications.
Original languageEnglish
Title of host publicationAI Assurance: Towards Trustworthy, Explainable, Safe, and Ethical AI
EditorsFeras Batarseh, Laura Freeman
PublisherElsevier
Pages339-367
ISBN (Electronic)978-0-323-91882-4
ISBN (Print)978-0-323-91919-7
DOIs
Publication statusPublished - 2022
MoE publication typeA3 Book section, Chapters in research books

Fingerprint

Dive into the research topics of 'Coordination-aware assurance for end-to-end machine learning systems: the R3E approach'. Together they form a unique fingerprint.

Cite this