Quality-driven Inference Orchestration for Multi-modality Machine Learning Systems at the Edge

Tri Nguyen, Anh-Dung Nguyen, Linh Truong

Research output: Contribution to journalArticleScientific

17 Downloads (Pure)

Abstract

Multi-modality machine learning (ML) systems are increasingly deployed in edge environments, such as for smart building and autonomous robotic applications. This enables analysis of complex subjects by performing ML inferences on multiple data sources. However, operating such ML systems for multi-tenant applications presents a runtime orchestration challenge to optimize the execution of multiple inference tasks across modalities. Analysis requests from different tenants often rely on different data sources with distinct quality requirements, resulting in a wide range of (conflicting) optimization objectives for runtime inference orchestration. The diversity of ML models and their performance variations further complicate the orchestration, particularly in scheduling inference tasks and distributing inference workloads on a heterogeneous edge system. This paper addresses these challenges by introducing an adaptive orchestration for multi-tenant applications in a multi-modality ML system. Our orchestration supports dynamic trade-offs between quality refinements and inference capability of inference services while scheduling multiple inference tasks under dynamic time constraints. The orchestration employs an efficient mechanism for selecting instances of inference services to distribute inference workloads across a heterogeneous edge cluster. This mechanism allows the orchestration to leverage cross-modal information to refine inference quality following the tenant objectives. Besides, our orchestration considers the performance variation caused by resource contentions, runtime failures, and explainability overheads while distributing workloads. Extensive experiments demonstrate that our orchestration can improve up to 10% inference accuracy and reduce 19% of the number of late responses in real-world scenarios.
Original languageEnglish
JournalUnder Submission
Publication statusSubmitted - 12 Aug 2025
MoE publication typeB1 Non-refereed journal articles

Keywords

  • Multi-modality ML System
  • Edge Inference
  • ML Orchestration
  • End-to-end ML Serving

Fingerprint

Dive into the research topics of 'Quality-driven Inference Orchestration for Multi-modality Machine Learning Systems at the Edge'. Together they form a unique fingerprint.

Cite this