Bridging Data Scarcity in Medicine through Distribution-Driven Synthesis and Comparative Statistical Evaluation – American Journal of Student Research

American Journal of Student Research

Bridging Data Scarcity in Medicine through Distribution-Driven Synthesis and Comparative Statistical Evaluation

Publication Date : Dec-09-2025

DOI: 10.70251/HYJR2348.36752763


Author(s) :

Anya Chiang.


Volume/Issue :
Volume 3
,
Issue 6
(Dec - 2025)



Abstract :

Reliable statistical modeling in medicine often faces a fundamental limitation: the scarcity of numerical patient data. Ethical, logistical, and financial constraints restrict large-scale clinical data collection, leading to small sample sizes that weaken statistical inference, inflate variance, and obscure nonlinear relationships among physiological variables. To address this limitation, the present study employs a data synthesis framework that expands an authentic Kaggle-sourced medical dataset of 80 patient records— each characterized by demographic, physiological, and lifestyle attributes—into a statistically equivalent large-sample version of 1,000 observations. Numerical variables were modeled through empirical and Gaussian-based distributions, while categorical variables were generated via probabilistic sampling to preserve realistic frequency structures. Comparative statistical analyses demonstrate that the synthesized dataset closely replicates the distributional, correlational, and categorical properties of the original while improving stability, representativeness, and parameter reliability. The enlarged dataset enhances the detection of nonlinear and interaction effects previously obscured by sample constraints. Overall, this study validates statistically guided data synthesis as an effective strategy for overcoming medical data scarcity and improving the robustness of health analytics. The findings emphasize that controlled dataset expansion can complement empirical data collection, supporting more reliable inference, generalizable modeling, and evidence-based decision-making in quantitative biomedical research.