1 Citation (Scopus)
33 Downloads (Pure)

Abstract

Proposal generation is a fundamental yet challenging task for two-stage temporal action detection pipelines. The task aims at predicting starting and ending boundaries of segments in realistic video sequences and action recognition methods cannot be directly applied to such videos due to their untrimmed nature. Most state-of-the-art models rely on temporal convolutional neural networks with pre-defined anchor segments. By eliminating anchors, we propose a lighter end-to-end trainable Anchor-Free Multiscale Transformer-based Generator (AMTG) model using local clues via video snippets. To improve effectiveness for temporal evaluation, we apply multiscale Transformer encoders to sequences with a bi-directional mask extension that simultaneously predicts boundary distances with uncertainties and various snippet-based local scores. Later, our model integrates local predictions to generate proposal candidates using the proposed scoring function. Experiments on the THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of AMTG for the temporal proposal generation task.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE
Pages1853-1858
Number of pages6
ISBN (Electronic)978-1-6654-6891-6
DOIs
Publication statusPublished - 2023
MoE publication typeA4 Conference publication
EventIEEE International Conference on Multimedia and Expo - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
PublisherIEEE
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

ConferenceIEEE International Conference on Multimedia and Expo
Abbreviated titleICME
Country/TerritoryAustralia
CityBrisbane
Period10/07/202314/07/2023

Keywords

  • anchor-free
  • multiscale transformer network
  • temporal action proposals
  • two-stage detectors

Fingerprint

Dive into the research topics of 'Anchor-Free Action Proposal Network with Uncertainty Estimation'. Together they form a unique fingerprint.

Cite this