Abstract
Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2024 - 18th European Conference, Proceedings |
Editors | Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol |
Publisher | Springer |
Pages | 466-485 |
Number of pages | 20 |
ISBN (Print) | 978-3-031-72690-3 |
DOIs | |
Publication status | Published - 2024 |
MoE publication type | A4 Conference publication |
Event | European Conference on Computer Vision - Milano, Italy Duration: 29 Sept 2024 → 4 Oct 2024 Conference number: 18 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 15082 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Computer Vision |
---|---|
Abbreviated title | ECCV |
Country/Territory | Italy |
City | Milano |
Period | 29/09/2024 → 04/10/2024 |
Keywords
- Synthetic Data
- Transfer Learning
- Visual Design