Title: Deep Learning Fault Localisation and Repair: Benchmarks, Limitations, and the Role of LLMs
Abstract: As Deep Learning (DL) systems become increasingly pervasive in safety-critical and high-impact domains, the need for effective techniques to test, localise, and repair faults in Deep Neural Networks (DNNs) has never been greater. Over the past few years, numerous fault localisation (FL) and repair approaches have been proposed, leveraging both static and dynamic analyses, as well as rule-based heuristics. However, a fundamental question remains: how effective and reliable are these techniques in practice?
In this talk, I will present a comprehensive empirical investigation into the current state of fault localisation and repair for DL systems. First, I will discuss a large-scale comparative evaluation of state-of-the-art FL techniques, conducted on a benchmark comprising both real-world faults collected from bug reporting platforms and faults generated via mutation testing. Our findings reveal that current techniques struggle to achieve strong and consistent performance when evaluated against a single human-defined ground truth, raising concerns about how effectiveness is currently assessed. Next, I will examine the broader ecosystem of DL fault localisation and repair techniques, highlighting their strengths and limitations. I will then present an empirical study investigating whether Large Language Models (LLMs) can effectively localise and repair faults in DL systems. Our evaluation shows that LLMs demonstrate strong performance compared to existing approaches, suggesting that they may offer a promising direction for advancing automated DL debugging. Finally, I will address a critical but often overlooked issue: the realism and reproducibility of existing DL fault benchmarks that are used to evaluate DL faults localisation and repair approaches. Through a manual analysis of hundreds of reported faults across widely used benchmarks, we find that only a limited subset satisfies strong realism criteria, and reproducibility remains a significant challenge. These findings raise important concerns about current evaluation practices and underscore the need for more rigorous assessment methodologies.
Bio: Gunel Jahangirova is a Lecturer (Assistant Professor) at King’s College London (KCL), United Kingdom. Prior to joining KCL, she was a Postdoctoral Researcher at Università della Svizzera Italiana (USI) in Lugano, Switzerland. She obtained her PhD through a joint programme between Fondazione Bruno Kessler (FBK) in Trento, Italy, and University College London (UCL), UK. Her research focuses on the automatic generation and evaluation of test oracles, error propagation in software systems, testing of deep learning systems, oracle design and quality metrics for autonomous vehicles, and the application of artificial intelligence to software engineering tasks.
Title: Verification of Embedded Software for Aeronautical Applications
Abstract: The development of embedded software for aeronautical applications is carried out in a highly regulated environment, with one of the main activities being the verification. Software verification in this context encompasses reviews, analyses, and testing. This presentation aims to provide an overview of verification activities for embedded software for aeronautical applications, placing them within the broader context of aircraft verification.
Bio: Marcelo Lemes holds a degree in Data Processing Technology and Mathematics from the University of Taubaté. He also holds a Master degree in Software Engineering from the Aeronautics Institute of Technology and a Doctorate in Digital Systems from the Polytechnic School of the University of São Paulo. He worked for 12 years at the Institute of Aeronautics and Space (IAE) of the Aerospace Technical Center (CTA), spending most of the time involved in the development of embedded software for the Brazilian Satellite Launch Vehicle (VLS). Since 1997, he has worked at EMBRAER involved with development and certification of embedded software applications. He currently works alongside the company’s Chief Engineer, coordinating embedded software activities for the company.