DoodleFormer: Creative Sketch Drawing with Transformers

Ankan Kumar Bhunia*, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

Abstrakti

Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frèchet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ankanbhunia/doodleformer.

AlkuperäiskieliEnglanti
OtsikkoComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
ToimittajatShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
KustantajaSPRINGER
Sivut338-355
Sivumäärä18
ISBN (painettu)9783031197895
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaEuropean Conference on Computer Vision - Tel Aviv, Israel
Kesto: 23 lokak. 202227 lokak. 2022
Konferenssinumero: 17
https://eccv2022.ecva.net

Julkaisusarja

NimiLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
KustantajaSpringer
Vuosikerta13677 LNCS
ISSN (painettu)0302-9743
ISSN (elektroninen)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision
LyhennettäECCV
Maa/AlueIsrael
KaupunkiTel Aviv
Ajanjakso23/10/202227/10/2022
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'DoodleFormer: Creative Sketch Drawing with Transformers'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä