On Optimizing Resources for Real-time End-to-End Machine Learning in Heterogeneous Edges

Research output: Working paperPreprintScientific

43 Downloads (Pure)

Abstract

Employing Machine Learning (ML) services within applications at the edge becomes a viable solution to achieve performance and data regulations. With the microservice architecture, these applications can scale dynamically, improving service availability under dynamic workloads. However, it poses numerous challenges in provisioning shared resources for multiple end-to-end ML applications. Prevalent orchestration tools/frameworks supporting edge ML serving are inefficient in provisioning while facing many challenges due to constrained resources and diverse resource demands. In this work, we present a provisioning method to optimize resource utilization for end-to-end ML applications on a heterogeneous edge. By profiling all microservices within the application, we estimate scales and allocate them on desired hardware platforms with sufficient resources, considering their runtime utilization patterns. We also provide several practical analyses on runtime monitoring metrics to detect and mitigate resource contentions, guaranteeing performance. The experiments with three real-world ML applications demonstrate the practicality of our method on a heterogeneous edge cluster of Raspberry Pis and Jetson Developer Kits.
Original languageEnglish
Publication statusSubmitted - 29 Jan 2024
MoE publication typeNot Eligible

Publication series

NameSoftware: Practice and Experience
PublisherJohn Wiley & Sons
ISSN (Print)0038-0644

Keywords

  • Resource Provisioning
  • End-to-end Machine Learning
  • Real-time Serving
  • Heterogeneous Edge

Fingerprint

Dive into the research topics of 'On Optimizing Resources for Real-time End-to-End Machine Learning in Heterogeneous Edges'. Together they form a unique fingerprint.

Cite this