Machine Learning Models for Breast Cancer Diagnosis Using Ultrasound Images

Publication Date : Feb-20-2026

DOI: 10.70251/HYJR2348.41777784

Author(s) :

Arshia Ghatak, Jeremy Hitt.

Volume/Issue :

Volume 4

Issue 1

(Feb - 2026)

Abstract :

Breast cancer is one of the most commonly diagnosed and deadliest diseases among women worldwide, and early detection is critical for improving survival outcomes. Ultrasound imaging is widely used in breast cancer screening due to its safety, low cost, and effectiveness in visualizing soft tissue. Image interpretation is highly dependent on radiologist expertise and, therefore, can be subject to variability. This study explores the use of classical machine learning approaches for automated breast ultrasound classification, emphasizing interpretability and reliability in small, clinically annotated datasets. Using a publicly available breast ultrasound dataset, images were preprocessed and transformed into engineered feature representations capturing texture, shape, and intensity characteristics commonly used in clinical assessment. Three supervised machine learning models - Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP) - were trained and evaluated to classify images as benign, malignant, or normal tissue. To address class imbalance, random oversampling was applied, and model performance was assessed using class-specific accuracy metrics and confusion matrices. The ensemble achieved improved overall classification performance and enhanced malignant tissue detection compared with each individual classifier. These results demonstrate that ensemble-based classical machine learning methods offer a practical, interpretable, and low-resource approach for automated breast cancer detection using ultrasound imaging. The individual models, Random Forest demonstrated the strongest and most balanced performance across all tissue classes, with texture- and shape-based features contributing most significantly to its predictions. A weighted ensemble voting classifier was then implemented to combine the strengths of all three models, assigning greater influence to the Random Forest based on validation performance. A deployable graphical user interface was also developed to make the system accessible to both clinicians and non-experts.

American Journal of Student Research®