MixupE: Understanding and improving Mixup from directional derivative perspective

Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

48 Downloads (Pure)

Abstract

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
Original languageEnglish
Title of host publicationProceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
PublisherJMLR
Pages2597-2607
Publication statusPublished - Aug 2023
MoE publication typeA4 Conference publication
EventConference on Uncertainty in Artificial Intelligence - Pittsburgh, United States
Duration: 31 Jul 20234 Aug 2023
Conference number: 39

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
Volume216
ISSN (Print)2640-3498

Conference

ConferenceConference on Uncertainty in Artificial Intelligence
Abbreviated titleUAI
Country/TerritoryUnited States
CityPittsburgh
Period31/07/202304/08/2023

Fingerprint

Dive into the research topics of 'MixupE: Understanding and improving Mixup from directional derivative perspective'. Together they form a unique fingerprint.

Cite this