Comparing simple neural network and ensemble learning models in predicting hydration energy of molecules represented by RDKit and Mordred descriptors
Publication Date : Dec-28-2024
Author(s) :
Volume/Issue :
Abstract :
Predicting molecular properties is a crucial task in the chemical sciences. Recently, there has been an enormous focus on leveraging machine learning to predict various molecular properties ranging from solubility to reaction rates. This study develops machine learning models to predict the hydration energy of molecules, comparing the performance of a neural network (Multi-Layer Perceptron) and an ensemble learning model (Random Forest). Using descriptors generated by RDKit and Mordred, we aimed to identify the optimal molecular representations for predictive accuracy. The FreeSolv database of 642 molecules provided experimental hydration energy data for training and testing. The models were evaluated using mean squared error (MSE) and the coefficient of determination (R²), with the Multi-Layer Perceptron achieving an R² above 0.9, outperforming the Random Forest model. Results suggest that the neural network model, in combination with RDKit descriptors, offers a strong balance between accuracy and computational efficiency. This study demonstrates the potential for simpler machine learning models to accurately predict molecular properties, supporting broader applications in chemistry where computational resources are limited.