When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

Wenqiang Luo, Jacky Wai Keung, Boyang Yang, He Ye, Claire Le Goues, Tegawendé F. Bissyandé, Haoye Tian*, Bach Le

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Software systems have been evolving rapidly and inevitably introducing bugs at an increasing rate, leading to significant maintenance costs. While large language models (LLMs) have demonstrated remarkable potential in enhancing software development and maintenance practices, particularly in automated program repair (APR), they rely heavily on high-quality code repositories. Most code repositories are proprietary assets that capture the diversity and nuances of real-world industry software practices, which public datasets cannot fully represent. However, obtaining such data from various industries is hindered by data privacy concerns, as companies are reluctant to share their proprietary codebases. There has also been no in-depth investigation of collaborative software development by learning from private and decentralized data while preserving data privacy for program repair.

To address the gap, we investigate federated learning as a privacy-preserving method for fine-tuning LLMs on proprietary and decentralized data to boost collaborative software development and maintenance. We use the private industrial dataset TutorCode for fine-tuning and the EvalRepair-Java benchmark for evaluation, and assess whether federated fine-tuning enhances program repair. We then further explore how code heterogeneity (i.e., variations in coding style, complexity, and embedding) and different federated learning algorithms affect bug fixing to provide practical implications for real-world software development collaboration. Our evaluation reveals that federated fine-tuning can significantly enhance program repair, achieving increases of up to 16.67% for Top@10 and 18.44% for Pass@10, even comparable to the bug-fixing capabilities of centralized learning. Moreover, the negligible impact of code heterogeneity implies that industries can effectively collaborate despite diverse data distributions. Different federated algorithms also demonstrate unique strengths across LLMs, suggesting that tailoring the optimization process to specific LLM characteristics can further improve program repair.
Original languageEnglish
JournalACM Transactions on Software Engineering and Methodology
DOIs
Publication statusE-pub ahead of print - 2025
MoE publication typeA1 Journal article-refereed

Fingerprint

Dive into the research topics of 'When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair'. Together they form a unique fingerprint.

Cite this