Can docstring reformulation with an LLM improve code generation?

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

Abstrakti

Generating code is an important application of Large Language Models (LLMs) and the task of function completion is one of the core open challenges in this context. Existing approaches focus on either training, fine-tuning or prompting LLMs to generate better outputs given the same input. We propose a novel and complementary approach: to optimize part of the input, the docstring (summary of a function’s purpose and usage), via reformulation with an LLM, in order to improve code generation. We develop two baseline methods for optimizing code generation via docstring reformulation and test them on the original HumanEval benchmark and multiple curated variants which are made more challenging by realistically worsening the docstrings. Our results show that, when operating on docstrings reformulated by an LLM instead of the original (or worsened) inputs, the performance of a number of open-source LLMs does not change significantly. This finding demonstrates an unexpected robustness of current open-source LLMs to the details of the docstrings. We conclude by examining a series of questions, accompanied by in-depth analyses, pertaining to the sensitivity of current open-source LLMs to the details in the docstrings, the potential for improvement via docstring reformulation and the limitations of the methods employed in this work.

AlkuperäiskieliEnglanti
OtsikkoEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop
ToimittajatNeele Falk, Sara Papi, Mike Zhang
KustantajaAssociation for Computational Linguistics
Sivut296-312
Sivumäärä17
ISBN (elektroninen)979-8-89176-090-5
TilaJulkaistu - 2024
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaConference of the European Chapter of the Association for Computational Linguistics - St. Julian's, Malta
Kesto: 17 maalisk. 202422 maalisk. 2024
Konferenssinumero: 18

Conference

ConferenceConference of the European Chapter of the Association for Computational Linguistics
LyhennettäEACL
Maa/AlueMalta
KaupunkiSt. Julian's
Ajanjakso17/03/202422/03/2024

Sormenjälki

Sukella tutkimusaiheisiin 'Can docstring reformulation with an LLM improve code generation?'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä