New paper on efficient ensembling methods for improved image segmentation and synthesis

Scalable Ensembles Put a Confidence Meter on Medical-Image AI

MITRP PhD student Weijie Chen and MITRP PI Alan McMillan recently published manuscript Segmentation And Synthesis With Interpretable Scalable Ensembles – Uncertainty Estimation (SASWISE-UE) in Computers in Biology and Medicine on June 2nd, 2025.

The problem: great models, opaque confidence

Deep-learning tools for image segmentation and synthesis are edging toward clinical use, yet most still act like black boxes—producing a single answer without telling radiologists how sure they are. Conventional ensemble techniques can provide that uncertainty signal, but training dozens of independent networks is painfully slow and hardware-hungry.

The idea: assemble “many” models out of one

We introduce SASWISE-UE (Segmentation And Synthesis With Interpretable Scalable Ensembles – Uncertainty Estimation). Starting from a single well-trained model checkpoint, we:

  1. Clone every block (encoder, bottleneck, decoder or transformer) several times.
  2. Shuffle the clones so each forward pass activates a different path—effectively a new sub-model.
  3. Diversify those paths with a dual-loss strategy that rewards both accuracy and disagreement.
  4. Fuse the outputs with majority vote (labels) or the median (continuous HU values) and turn the path-to-path spread into an uncertainty map.

Because the template network is shared, the compute cost grows linearly while the number of possible sub-models grows exponentially—thousands of virtual experts for the price of one. A pruning routine trims under-performing blocks, shrinking models while keeping—or even boosting—accuracy.

Study results

Task Dataset Backbone(s) Metric Baseline SASWISE-UE Δ
Body-organ segmentation BTCV (13 organs, CT) U-Net Mean Dice 0.789 0.814 +3.2 pts
MR → CT synthesis In-house paired head MR/CT (660 cases) U-Net MAE (HU) 89.43 88.17 –1.26 HU

Wilcoxon signed-rank test, α = 0.05 with Bonferroni correction.

Why it matters

  • Built-in honesty: Voxel-wise uncertainty maps flag areas likely to be wrong, letting clinicians verify tricky boundaries or suspect HU ranges. During simulated noise, under sampling, and other corruptions, error and uncertainty stayed strongly correlated.
  • Hardware-friendly: One GPU, one initial training run, exponential ensemble size. Ideal for groups without giant compute clusters.
  • Architecture-agnostic: Works on convolutional (U-Net) and transformer (UNETR) backbones, hinting at broad applicability across imaging tasks.
  • Clinical trust layer: By spotlighting where the network lacks confidence, SASWISE-UE eases regulatory concerns about “silent failures” in real-world deployment.

Caveats & next steps

  • Single-institution data. Multi-center validation is needed to prove generalizability.
  • Post-hoc ensemble. Fine-tuning blocks on larger, pathology-rich datasets could push Dice/MAE higher.
  • Human-in-the-loop studies. How do technologists and radiologists actually use the uncertainty maps during workflow?

We have released our code on GitHub, and are continuing to evaluate the technique in a pipeline for PET/MR attenuation-correction.

“SASWISE-UE turns every deep-learning model into a crowd of second opinions—complete with a confidence gauge.” — Alan B. McMillan, senior author

Read the article (https://dx.doi.org/10.1016/j.compbiomed.2025.110258) and the earlier pre-print (https://arxiv.org/abs/2411.05324) for more details. The source code is available on GitHub: https://github.com/search?q=saswise-ue&type=repositories.