DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Yi Hao Peng*, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jeffrey P. Bigham, Amy Pavel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)

Abstract

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer
Pages466-485
Number of pages20
ISBN (Print)978-3-031-72690-3
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventEuropean Conference on Computer Vision - Milano, Italy
Duration: 29 Sept 20244 Oct 2024
Conference number: 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15082 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision
Abbreviated titleECCV
Country/TerritoryItaly
CityMilano
Period29/09/202404/10/2024

Keywords

  • Synthetic Data
  • Transfer Learning
  • Visual Design

Fingerprint

Dive into the research topics of 'DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation'. Together they form a unique fingerprint.

Cite this