Authors
Xin Sun, Daniel Ståhl and Kristian Sandahl, Linköping University, Sweden
Abstract
The role of regression testing in software testing is crucial as it ensures that any new modifications do not disrupt the existing functionality and behaviour of the software system. However, the presence of Flaky Tests undermines the reliability of regression testing results. In this paper, we propose an LLM-based approach for classifying the root cause of identified flaky tests in C++ projects at the code level. We compile a comprehensive collection of C++ project flaky tests sourced from GitHub. We finetune Mistral7b, Llama2-7b and CodeLlama-7b models on the C++ dataset and an existing Java dataset and evaluate the performance. The results indicate that our models exhibit varying performance on the C++ dataset, while their performance is comparable to that of the Java dataset. Our results demonstrate the exceptional capability of LLMs to accurately classify flakiness in C++ and Java projects, providing a promising approach to enhance the efficiency of debugging flaky tests in practice.
Keywords
Software Testing, Flaky Tests, LLMs, Flakiness Classification