Improving Epilepsy Diagnosis with Diffusion Models and Precision-Focused Optimization

Authors: Lantian Zhang, Duong Nhu, Yun Zhao, Emma Foster, Lyn Millist, Shobi Sivathamboo, Patrick Kwan, Zongyuan Ge, Lan Du, and Levin Kuhlmann (corresponding author)

Published: International Journal of Neural Systems, Vol. 36, No. 8 (2026) DOI: 10.1142/S0129065726500218

The Challenge: Needle in a Haystack

Epilepsy affects over 50 million people worldwide, and one of the most important tools for diagnosis is the Electroencephalogram (EEG). Clinicians look for specific patterns called Interictal Epileptiform Discharges (IEDs) — brief, abnormal electrical spikes that occur between seizures and serve as the most reliable electrophysiological biomarkers of epilepsy.

The problem? These IEDs are extraordinarily rare in EEG recordings. They are vastly outnumbered by normal background brain activity, creating a severe class imbalance that makes automated detection extremely challenging. Most existing AI systems, when deployed on data from a different hospital than they were trained on, produce an unacceptable number of false positives — flagging normal brain activity as epileptiform. This means neurologists end up wasting valuable time reviewing predictions that turn out to be nothing, undermining the very efficiency these tools are designed to provide.

“The critical question in clinical practice is whether the patient’s EEG contains IEDs or not. False positives result in more inspection time by the clinician, directly reducing the practical value of automated detection systems.”

Our Approach: A Dual Strategy

This work from brAIn Lab at Monash University tackles the false positive problem from two complementary angles: generating more realistic training data using diffusion probabilistic models, and directly optimizing for precision through AUPRC maximization. Together, these techniques form a unified framework that substantially improves the reliability of IED detection, particularly when the model encounters data from hospitals it has never seen before.

Strategy 1: Synthesizing Realistic IEDs with Diffusion Models

Diffusion probabilistic models work through a two-phase process. In the forward phase, real IED signals are gradually corrupted with noise over many steps until they become pure Gaussian noise. In the reverse phase, a neural network learns to reverse this process — starting from random noise and iteratively denoising it to produce realistic synthetic IED waveforms.

The team used the EEGWave architecture, a specialized diffusion model designed for EEG signal generation. The architecture employs 40 residual blocks with exponentially increasing dilation rates, allowing it to capture both the sharp, transient spikes characteristic of IEDs and longer-term periodic discharge patterns. The model uses a 1000-step linear noise schedule to progressively generate high-fidelity synthetic EEG segments.

By combining these synthesized IEDs with the real data, the team could fully balance the training dataset — effectively solving the data scarcity and class imbalance problems that plague EEG-based detection systems.

Strategy 2: Directly Optimizing Precision with APLoss

Standard classification loss functions like Binary Cross-Entropy (BCE) don’t explicitly optimize for precision. They treat all errors equally, which means the model has no particular incentive to avoid false positives. This is especially problematic when the data is imbalanced — the model can achieve a deceptively high accuracy simply by being conservative about predicting the minority class.

To address this, the team incorporated Average Precision Loss (APLoss), a compositional optimization framework that directly maximizes the Area Under the Precision-Recall Curve. APLoss uses a squared hinge loss formulation with provable convergence guarantees, forcing the model to focus specifically on making its positive predictions reliable — exactly what clinicians need.

Key Results at a Glance

Metric	Within-Hospital (TUEV)	Cross-Hospital (Alfred)
Precision Improvement	+4.5%	+40.04%
F1-Score Improvement	+0.7%	+18.74%
AUC	0.99	0.95
AUPRC	0.98	0.93

Key Contributions

🧬 Diffusion-Based Data Synthesis: A generative framework using Denoising Diffusion Probabilistic Models (EEGWave architecture) to synthesize realistic, normalized IED samples — effectively addressing data scarcity, class imbalance, and quality issues in EEG datasets.

🎯 AUPRC Maximization: Integration of Average Precision Loss (APLoss) as an auxiliary objective function, directly optimizing for precision and dramatically reducing the false positives that waste clinician time during EEG review.

🏥 Cross-Hospital Generalization: Models trained on the TUEV corpus showed massive improvements when evaluated on independent Alfred Hospital data — a 40% precision gain and 18.7% F1-score improvement — demonstrating real-world clinical viability.

📊 Within-Hospital Excellence: On the TUEV evaluation dataset, the combined approach achieved the highest AUPRC (0.98), with precision reaching 94.8% and F1-score at 92.7%, outperforming baseline models and alternative augmentation strategies.

⚙️ Clinical Deployment Analysis: Comprehensive evaluation of inference latency, FLOPs, model parameters, and diffusion sampling costs — providing practical insights for deploying these systems in resource-constrained clinical environments.

Deeper Dive: What Makes This Work

Quality of Synthetic Data

Not all synthetic data is created equal. The team rigorously evaluated their generated IEDs using three metrics: Fréchet Inception Distance (FID) for overall quality, and density and coverage metrics for fidelity and diversity. The results showed that synthesized PLED and GPED signals achieved excellent quality scores (FID of 0.10 and 0.21 respectively), with density and coverage values near or above 1.0 and 0.78, indicating the synthetic data closely matched the real distribution while introducing meaningful variety.

The Precision–Sensitivity Trade-off

An important finding of this work is the principled trade-off between precision and sensitivity. When APLoss is applied, the model becomes more conservative in its positive predictions — it flags fewer events as IEDs, but the ones it does flag are much more likely to be correct. In the cross-hospital evaluation, this translated to precision jumping from 55.1% to 77.2%, while sensitivity decreased modestly from 95.2% to 89.3%. For clinical practice, this trade-off is highly favorable: it is far better to reliably detect most IEDs with very few false alarms than to catch every possible IED while burying the clinician in hundreds of false positives.

Ablation Studies and Design Choices

The paper includes thorough ablation experiments comparing noise schedules (linear, cosine, sigmoid), diffusion sampling steps (100–2000), and augmentation strategies (diffusion-based, SMOTE, Mixup). The linear schedule with 1000 steps emerged as the best balance of fidelity, diversity, and computational efficiency. Notably, simple augmentation strategies like Mixup showed limited improvement and even degraded AUPRC, while SMOTE achieved reasonable precision but lower sensitivity. The diffusion-based approach combined with APLoss consistently delivered the best overall F1-scores across all evaluation settings.

Why It Matters

This research addresses a genuine bottleneck in clinical neurology. EEG interpretation is time-consuming, requires specialized expertise, and suffers from significant inter-rater variability even among experts. An automated system that can reliably flag IEDs without overwhelming clinicians with false positives has the potential to transform epilepsy diagnosis — particularly in resource-limited settings where expert neurologists are scarce.

The cross-hospital generalization results are especially encouraging. Most AI systems for medical applications perform well on data similar to their training set but degrade dramatically on data from different institutions with different recording protocols, equipment, and patient populations. The 40% improvement in cross-hospital precision demonstrated here suggests that the combination of diffusion-based augmentation and precision-focused optimization creates models that are substantially more robust to this kind of distributional shift.

As the authors note, the ultimate clinical goal isn’t to identify every single IED in a recording — it’s to determine whether a patient’s EEG contains IEDs at all. A high-precision system that confidently identifies the presence of IEDs, even if it misses a few individual discharges, is far more clinically useful than one that flags everything but requires extensive manual review to separate signal from noise.

Looking Ahead

Future directions include enhancing the diffusion model architecture and guidance strategies to further improve synthesized sample quality, exploring conditional generation approaches, and validating the framework on larger multi-center datasets. The team also plans to investigate Dynamic Time Warping for more nuanced evaluation of synthesized waveform quality, and to study the relationship between generative model metrics and downstream classifier performance more systematically.

Read the Full Paper

📄 Paper: Interictal Epileptiform Discharge Detection Through Probabilistic Diffusion Models with Maximization of Precision Recall Metrics

Published in: International Journal of Neural Systems, Vol. 36, No. 8 (2026)

brAIn Lab · Faculty of Information Technology · Monash University

Tags: #brAInLab #Epilepsy #EEG #DiffusionModels #DeepLearning #MedicalAI #NeuralSystems #AIResearch #MachineLearning #ClinicalNeurophysiology

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31