Melanoma classification for SIIM-ISIC Kaggle competition

Project Image

Abstract

This research explores resource-efficient approaches to melanoma classification from dermoscopic images, addressing the challenge of operating under computational constraints in the context of the SIIM-ISIC 2020 competition. While top-performing submissions typically employ ensemble methods with multiple models trained on extensive datasets over several days using powerful GPU clusters, we demonstrate that competitive performance can be achieved with significantly fewer resources through strategic data sampling and model selection. Our approach focuses on constructing a smaller, high-quality dataset of 3,987 images (compared to the original 33,126) by maintaining diversity across patients while preserving all melanoma cases, which resulted in a model that achieved a ROC-AUC of 0.845 on the competition test data. This represents a substantial efficiency improvement, reducing training time from hours to minutes while maintaining performance comparable to models trained on the full dataset. We also investigated specialized models based on sex and image brightness characteristics, though these underperformed due to insufficient data for separate feature extractors. Our findings suggest that in medical imaging applications with highly imbalanced datasets, thoughtful dataset construction can be more valuable than raw data volume, making accurate melanoma detection more accessible in resource-limited healthcare environments. This work contributes to the broader effort of developing efficient AI solutions for medical applications where computational resources may be limited but diagnostic accuracy remains critical.

Further Reading

For more details, you can read the full report here: Melanoma classification for SIIM-ISIC Kaggle competition (PDF)