Projects per year
Abstract
Generating code is an important application of Large Language Models (LLMs) and the task of function completion is one of the core open challenges in this context. Existing approaches focus on either training, fine-tuning or prompting LLMs to generate better outputs given the same input. We propose a novel and complementary approach: to optimize part of the input, the docstring (summary of a function’s purpose and usage), via reformulation with an LLM, in order to improve code generation. We develop two baseline methods for optimizing code generation via docstring reformulation and test them on the original HumanEval benchmark and multiple curated variants which are made more challenging by realistically worsening the docstrings. Our results show that, when operating on docstrings reformulated by an LLM instead of the original (or worsened) inputs, the performance of a number of open-source LLMs does not change significantly. This finding demonstrates an unexpected robustness of current open-source LLMs to the details of the docstrings. We conclude by examining a series of questions, accompanied by in-depth analyses, pertaining to the sensitivity of current open-source LLMs to the details in the docstrings, the potential for improvement via docstring reformulation and the limitations of the methods employed in this work.
Original language | English |
---|---|
Title of host publication | EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop |
Editors | Neele Falk, Sara Papi, Mike Zhang |
Publisher | Association for Computational Linguistics |
Pages | 296-312 |
Number of pages | 17 |
ISBN (Electronic) | 979-8-89176-090-5 |
Publication status | Published - 2024 |
MoE publication type | A4 Conference publication |
Event | Conference of the European Chapter of the Association for Computational Linguistics - St. Julian's, Malta Duration: 17 Mar 2024 → 22 Mar 2024 Conference number: 18 |
Conference
Conference | Conference of the European Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | EACL |
Country/Territory | Malta |
City | St. Julian's |
Period | 17/03/2024 → 22/03/2024 |
Fingerprint
Dive into the research topics of 'Can docstring reformulation with an LLM improve code generation?'. Together they form a unique fingerprint.Projects
- 1 Finished
-
-: Finnish Center for Artificial Intelligence
Kaski, S. (Principal investigator)
01/01/2019 → 31/12/2022
Project: Academy of Finland: Other research funding