A Large Language Model Approach to Classify Flakiness in C++ Projects

Xin Sun, Daniel Ståhl and Kristian Sandahl, Linköping University, Sweden; Xin Sun, Daniel Ståhl and Kristian Sandahl, Linköping University, Sweden

A Large Language Model Approach to Classify Flakiness in C++ Projects

Authors

Xin Sun, Daniel Ståhl and Kristian Sandahl, Linköping University, Sweden

Abstract

The role of regression testing in software testing is crucial as it ensures that any new modifications do not disrupt the existing functionality and behaviour of the software system. However, the presence of Flaky Tests undermines the reliability of regression testing results. In this paper, we propose an LLM-based approach for classifying the root cause of identified flaky tests in C++ projects at the code level. We compile a comprehensive collection of C++ project flaky tests sourced from GitHub. We finetune Mistral7b, Llama2-7b and CodeLlama-7b models on the C++ dataset and an existing Java dataset and evaluate the performance. The results indicate that our models exhibit varying performance on the C++ dataset, while their performance is comparable to that of the Java dataset. Our results demonstrate the exceptional capability of LLMs to accurately classify flakiness in C++ and Java projects, providing a promising approach to enhance the efficiency of debugging flaky tests in practice.

Keywords

Software Testing, Flaky Tests, LLMs, Flakiness Classification

CS&IT Conference Proceedings

A Large Language Model Approach to Classify Flakiness in C++ Projects