FormaliSE 2026
Sun 12 - Mon 13 April 2026 Rio de Janeiro, Brazil
co-located with ICSE 2026

While most neural network verification tools are evaluated on image-based benchmarks, verifying models trained on preprocessed data that bears a complex and nonlinear relationship to the input poses additional challenges. We take the example of speech recognition, where spectrogram-based preprocessing obscures the correspondence between waveform perturbations and model inputs. We consider two convolutional neural networks for spoken digit classification, one operating directly on raw waveform inputs and the other on Mel spectrogram representations, each trained under Projected Gradient Descent (PGD) adversarial settings for varying values of perturbation $\epsilon$. The models are evaluated for robustness using state-of-the-art neural network verifiers, $\alpha,\beta$-CROWN, Marabou, and NeuralSAT. Results show that $\alpha,\beta$-CROWN outperforms other verifiers; however, verification success declines significantly as $\epsilon$ increases, with approximately 94% of samples verified at $\epsilon = 0$ and only 21% at $\epsilon = 0.02$. Our case study highlights several practical challenges in neural network verification for the selected domain. By documenting these challenges, this work guides the development of verification-friendly neural networks and preprocessing design for audio and other domains that involve complex data preprocessing.