Siirry päänavigointiin Siirry hakuun Siirry pääsisältöön

An agentic vision-action framework for generative 3D architectural modeling from sketches

  • Ximing Zhong*
  • , Jiadong Liang
  • , Xianchuan Meng
  • , Yingkai Li
  • , Pia Fricker
  • , Immanuel Koh
  • *Tämän työn vastaava kirjoittaja

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

3 Viittaukset (Web of Science)
44 Lataukset (Pure)

Abstrakti

In recent years, advances in generative AI have enabled the direct generation of 3D models from sketches or images, offering new possibilities in architectural design. However, most current AI-driven modeling approaches still operate as “black boxes,” exhibiting issues such as opaque modeling processes, non-editable outputs, and a lack of semantic depth. In the field of architectural design, ideal tools should not only support structured component generation and spatial reasoning but also facilitate iterative workflows and collaborative creation. To address these challenges, inspired by the iterative design processes of human architects, we propose an agentic vision-action framework to assist architects in reasoning controllable and explainable 3D models from simple sketches. The framework involves the collaboration of multiple AI agents—including a Vision Agent, a 3D Reasoning Agent, a Reflection Agent, and a Data-Driven 3D Layout Agent—that collectively support sketch interpretation, spatial reasoning, and the generation of editable, structured 3D models. By integrating vision-language models (VLMs) with data-driven techniques, the system predicts detailed 3D spatial layouts and enables intuitive modifications through both visual and language inputs. Experimental results show that our approach surpasses existing methods in sketch interpretation, spatial reasoning, and structured 3D model generation. The outputs are not only editable and semantically rich but also composed of interpretable and traceable modeling steps, highlighting the potential of AI to assist architects in explainable and controllable design workflows. Instead of replicating human cognition, the framework is designed to augment it by enabling iterative feedback loops that interpret ambiguity, co-evolve design intent, and support co-constructive human–AI collaboration.

AlkuperäiskieliEnglanti
Artikkeli14780771251352950
Sivut679-700
Sivumäärä22
JulkaisuInternational Journal of Architectural Computing
Vuosikerta23
Numero3
Varhainen verkossa julkaisun päivämäärä28 heinäk. 2025
DOI - pysyväislinkit
TilaJulkaistu - syysk. 2025
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Sormenjälki

Sukella tutkimusaiheisiin 'An agentic vision-action framework for generative 3D architectural modeling from sketches'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä