Evaluation of Imputation Methods for Handling Missing Data in Mechanical Materials Datasets
Publication Date : Nov-01-2024
Author(s) :
Volume/Issue :
Abstract :
Mechanical design materials are integral to advancing modern technologies due to their diverse mechanical properties. These properties are crucial in determining the material’s suitability for various engineering applications. However, research on mechanical materials often encounters missing data, which can lead to biased results and reduced statistical power. While several imputation methods exist to handle missing data, there is a lack of focused studies evaluating their performance in the context of mechanical materials. To address this gap, a comprehensive dataset was obtained, and 10% of the original data for ultimate tensile strength (Su) and yield strength (Sy) were intentionally deleted. Four imputation methods—mean imputation, random fill, regression imputation, and k-nearest neighbors (KNN) imputation—were employed to restore the missing data. The performance of these methods was evaluated using Pearson’s correlation, multiple linear regression, and permutation feature importance. The results showed that mean and KNN imputation methods provided the closest match to the original data, while regression imputation also performed well with minor deviations. Random fill was the least reliable method. These findings provide guidance on selecting appropriate imputation techniques for mechanical materials datasets, ultimately improving the robustness of future research.