Inputting Missing Values in Mechanical Materials Data: Accuracy and Statistical Effect of Mean, Median, and KNN Methods – American Journal of Student Research

American Journal of Student Research

Inputting Missing Values in Mechanical Materials Data: Accuracy and Statistical Effect of Mean, Median, and KNN Methods

Publication Date : Feb-10-2026

DOI: 10.70251/HYJR2348.42319326


Author(s) :

Jaydn Su .


Volume/Issue :
Volume 4
,
Issue 2
(Feb - 2026)



Abstract :

This study explores how different methods for handling missing data change the accuracy of mechanical property datasets. A dataset containing ultimate tensile strength (Su) and related mechanical properties was used, with 100 Su values randomly removed to simulate realistic data loss. Three commonly used methods were tested: mean substitution, median substitution, and k-nearest neighbor (KNN) imputation. Each completed dataset was then compared to the original to see how well statistical relationships were maintained. The results indicated that the median imputation method produced the most accurate reconstruction in this dataset and under MCAR simulation among the tested methods, maintaining an almost exact correlation with the original Su values with a Pearson correlation coefficient (r) value of 0.9987 and a coefficient of determination (R²) value of 0.9487 in the linear regression model. Both mean and KNN imputation performed sufficiently, but introduced larger deviations from the original relationships. Overall, the findings show that under the MCAR missingness simulation used here and within this specific mechanical-property dataset, the median imputation method provides the most effective balance between accuracy and preservation of statistical structure among the three methods tested, suggesting that median imputation may serve as a practical solution for researchers and engineers who regularly work with incomplete mechanical property data.