Funct. Mater. 2025; 32 (4): 715-722.
Data-driven discovery of functional materials: LARS–LASSO logistic regression for QSAR/QSPR design of compounds with anti-COVID-19 and other activities
School of Chemistry, V. N. Karazin Kharkiv National University, Svobody sq., 4, Kharkiv, 61022, Ukraine
The possibility of using the L1-regularization to obtain logistic classification equations of quantitative/qualitative structure-activity/property relationships (QSAR/QSPR) have been investigated. The least angle regression (LARS) of least absolute shrinkage and selection operator (LASSO) variant has been implemented in the logistic regression. The method was used for building simple classification functions for three tasks: to evaluate basicity of different organic compounds towards Li+ cation, to study binding affinity to the estrogen receptor of various organic molecules, and to predict activity against COVID-19 main protease. The obtained simple classification functions have satisfactory prognostic properties. The obtained results provide a foundation for the investigation of the electronic and spatial structures of potential ligands exhibiting the desired activity. A comparative analysis of chemoinformatics approaches facilitates the optimization of lead identification methodologies.
1. A. Tharwat, T. Gaber, A. Ibrahim, A.E. Hassanien, AI Communications, 30(2), 169 (2017). https://doi.org/10.3233/AIC-170729
2. L. Breiman, Machine Learning, 45, 5 (2001). https://doi.org/10.1023/A:1010933404324
3. I. Steinwart, A. Christmann, in: Support Vector Machines, Information Science and Statistics. Springer, New York, P. 285 (2011).
4. G.P. Zhang, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451 (2000). https://doi.org/10.1109/5326.897072
5. V.E. Vinzi, W.W. Chin, J. Henseler, H. Wang, in: Handbook of Partial Least Squares. Concepts, Methods and Applications, Springer, Heidelberg, P. 195 (2010). https://doi.org/10.1007/978-3-540-32827-8
6. Q. Zhang, Y. N. Wu, S.-C. Zhu, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, P. 8827 (2018). https://doi.org/10.1109/CVPR.2018.00920
7. C.W. Yap, J. Comput. Chem., 32(7), 1466 (2011). https://doi.org/10.1002/jcc.21707
8. R.J. Tibschirani, Electron. J. Statist., 7, 1456 (2013).
9. S-I. Lee, H. Lee, P. Abbeel, A, Ng, in: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), 1, 401 (2006).
10. M.I. Berdnyk, A.B. Zakharov, V.V. Ivanov, Methods Objects Chem. Anal., 14 (2), 90 (2019). https://doi.org/10.17721/moca.2019.79-90
11. M.I. Berdnyk, M.O. Onizhuk, V.V. Ivanov, Kharkov Univ. Bull. Chem. Ser., 30, 6 (2018).
12. Lasso model fit with Least Angle Regression a.k.a. Lars. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.L...
13. Y-Y. Song, Y. Lu, Shanghai Arch Psychiatry, 27(2), 130 (2015). https://doi.org/10.1016/j.meddro.2014.10.001
14. D.M. Hawkins, J. Chem. Inf. Comput. Sci., 44 (1), 1 (2003). https://doi.org/10.1021/ci0342472
15. K. Fujiki, S. Ikeda, H. Kobayashi, A. Mori, A. Nagira, J. Nie, T. Sonoda, Y. Yagupolskii, Chem. Lett., 29 (1), 62 (2000). https://doi.org/10.1246/cl.2000.62
16. S. Moss, B. T. King, A. de Meijere, S. I. Kozhushkov, P. E. Eaton, J. Michl, Org. Lett., 3 (15), 2375 (2001). https://doi.org/10.1021/ol0161864
17. V. Volkis, H. Mei, R. K. Shoemaker, J. Michl, J. Am. Chem. Soc., 131(9), 3132 (2009). https://doi.org/10.1021/ja807297g
18. T. Fujii, Mass Spectrom. Rev. 19(3), 111 (2000). https://doi.org/10.1002/1098-2787(200005/06)19:3<111::AID-MAS1>3.0.CO;2-K
19. M. Sablier, T. Fujii, Chem. Rev., 102(9), 2855 (2002). https://doi.org/10.1021/cr010295e
20. S. Takahashi, M. Nakamura, T. Fujii, J. Am. Soc. Mass Spectrom. 23(3), 547 (2012). https://doi.org/10.1007/s13361-011-0302-x
21. J. Jover, R. Bosque, J. Sales, J. Chem. Inf. Comput. Sci. 44, 1727 (2004). https://doi.org/10.1021/ci0498362
22. DSSTox (NCTRER) National Center for Toxicological Research Estrogen Receptor Binding Database. https://pubchem.ncbi.nlm.nih.gov/bioassay/1204
23. Z. Jin, X. Du, Y. Xu, Y. Deng, et al. Nature. 582(7811), 289 (2020). https://doi.org/10.1038/s41586-020-2223-y
24. D.O. Anokhin, S.M. Kovalenko, P.V. Trostianko, A.V. Kyrychenko, A.B. Zakharov, T.O. Zubatiuk, V.V. Ivanov, O.M. Kalugin, Kharkiv University Bulletin. Chemical Series., 42, 6 (2024). https://doi.org/10.26565/2220-637X-2024-42-01
25. T. A. Halgren, J. Comp. Chem., 17, 490 (1996). https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P
26. T. A. Halgren, J. Comp. Chem., 17, 520 (1996). https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<520::AID-JCC2>3.0.CO;2-W
27. T. A. Halgren, J. Comp. Chem., 17, 553 (1996). https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<553::AID-JCC3>3.0.CO;2-T
28. T. A. Halgren, R. B. Nachbar, J. Comp. Chem., 17, 587 (1996). https://doi.org/10.1002/(SICI)1096-987X(199604)17:6<587::AID-JCC4>3.0.CO;2-P
29. T. A. Halgren, J. Comp. Chem., 20, 720 (1999). https://doi.org/10.1002/(SICI)1096-987X(199905)20:7<720::AID-JCC7>3.0.CO;2-X
30. K-Means https://scikit-learn.org/1.5/modules/generated/sklearn.cluster.KMeans.html
31. P. A. Bradley, Pattern Recognition. 30(7), 1145 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2
32. R. Todeschini, V. Consonni, Methods and Principles in Medicinal Chemistry. Weinheim: Wiley VCH, 2009.
33. J. A. Platts, D. Butina M. H. Abraham, A. Hersey, J. Chem. Inf. Comput Sci., 39(5), 835 (1999). https://doi.org/10.1021/ci980339t
34. K. Roy, G. Ghosh, J. Chem. Inf. Comput. Sci., 44, 559 (2004). https://doi.org/10.1021/ci0342066
35. L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci., 35, 1039 (1995). https://doi.org/10.1021/ci00028a014