Funct. Mater. 2025; 32 (4): 715-722.

doi:https://doi.org/10.15407/fm32.04.715

Data-driven discovery of functional materials: LARS–LASSO logistic regression for QSAR/QSPR design of compounds with anti-COVID-19 and other activities

M. I. Berdnyk, D. O. Anokhin, I. V. Khristenko, V. V. Ivanov, S. M. Kovalenko, O. N. Kalugin

School of Chemistry, V. N. Karazin Kharkiv National University, Svobody sq., 4, Kharkiv, 61022, Ukraine

Abstract: 

The possibility of using the L1-regularization to obtain logistic classification equations of quantitative/qualitative structure-activity/property relationships (QSAR/QSPR) have been investigated. The least angle regression (LARS) of least absolute shrinkage and selection operator (LASSO) variant has been implemented in the logistic regression. The method was used for building simple classification functions for three tasks: to evaluate basicity of different organic compounds towards Li+ cation, to study binding affinity to the estrogen receptor of various organic molecules, and to predict activity against COVID-19 main protease. The obtained simple classification functions have satisfactory prognostic properties. The obtained results provide a foundation for the investigation of the electronic and spatial structures of potential ligands exhibiting the desired activity. A comparative analysis of chemoinformatics approaches facilitates the optimization of lead identification methodologies.

Keywords: 
QSAR-QSPR, Logistic Regression, classification, LARS, LASSO, COVID-19
References: 

1. A. Tharwat, T. Gaber, A. Ibrahim, A.E. Hassanien, AI Communications, 30(2), 169 (2017).

2. L. Breiman, Machine Learning, 45, 5 (2001).

3. I. Steinwart, A. Christmann, in: Support Vector Machines, Information Science and Statistics. Springer, New York, P. 285 (2011).

4. G.P. Zhang, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451 (2000).

5. V.E. Vinzi, W.W. Chin, J. Henseler, H. Wang, in: Handbook of Partial Least Squares. Concepts, Methods and Applications, Springer, Heidelberg, P. 195 (2010).

6. Q. Zhang, Y. N. Wu, S.-C. Zhu, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, P. 8827 (2018).

7. C.W. Yap, J. Comput. Chem., 32(7), 1466 (2011).

8. R.J. Tibschirani, Electron. J. Statist., 7, 1456 (2013).

9. S-I. Lee, H. Lee, P. Abbeel, A, Ng, in: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), 1, 401 (2006).

10. M.I. Berdnyk, A.B. Zakharov, V.V. Ivanov, Methods Objects Chem. Anal., 14 (2), 90 (2019).

11. M.I. Berdnyk, M.O. Onizhuk, V.V. Ivanov, Kharkov Univ. Bull. Chem. Ser., 30, 6 (2018).

12. Lasso model fit with Least Angle Regression a.k.a. Lars. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.L...

13. Y-Y. Song, Y. Lu, Shanghai Arch Psychiatry, 27(2), 130 (2015).

14. D.M. Hawkins, J. Chem. Inf. Comput. Sci., 44 (1), 1 (2003).

15. K. Fujiki, S. Ikeda, H. Kobayashi, A. Mori, A. Nagira, J. Nie, T. Sonoda, Y. Yagupolskii, Chem. Lett., 29 (1), 62 (2000).

16. S. Moss, B. T. King, A. de Meijere, S. I. Kozhushkov, P. E. Eaton, J. Michl, Org. Lett., 3 (15), 2375 (2001).

17. V. Volkis, H. Mei, R. K. Shoemaker, J. Michl, J. Am. Chem. Soc., 131(9), 3132 (2009).

18. T. Fujii, Mass Spectrom. Rev. 19(3), 111 (2000).

19. M. Sablier, T. Fujii, Chem. Rev., 102(9), 2855 (2002).

20. S. Takahashi, M. Nakamura, T. Fujii, J. Am. Soc. Mass Spectrom. 23(3), 547 (2012).

21. J. Jover, R. Bosque, J. Sales, J. Chem. Inf. Comput. Sci. 44, 1727 (2004).

22. DSSTox (NCTRER) National Center for Toxicological Research Estrogen Receptor Binding Database. https://pubchem.ncbi.nlm.nih.gov/bioassay/1204

23. Z. Jin, X. Du, Y. Xu, Y. Deng, et al. Nature. 582(7811), 289 (2020).

24. D.O. Anokhin, S.M. Kovalenko, P.V. Trostianko, A.V. Kyrychenko, A.B. Zakharov, T.O. Zubatiuk, V.V. Ivanov, O.M. Kalugin, Kharkiv University Bulletin. Chemical Series., 42, 6 (2024).

25. T. A. Halgren, J. Comp. Chem., 17, 490 (1996).

26. T. A. Halgren, J. Comp. Chem., 17, 520 (1996).

27. T. A. Halgren, J. Comp. Chem., 17, 553 (1996).

28. T. A. Halgren, R. B. Nachbar, J. Comp. Chem., 17, 587 (1996).

29. T. A. Halgren, J. Comp. Chem., 20, 720 (1999).

30. K-Means https://scikit-learn.org/1.5/modules/generated/sklearn.cluster.KMeans.html

31. P. A. Bradley, Pattern Recognition. 30(7), 1145 (1997).

32. R. Todeschini, V. Consonni, Methods and Principles in Medicinal Chemistry. Weinheim: Wiley VCH, 2009.

33. J. A. Platts, D. Butina M. H. Abraham, A. Hersey, J. Chem. Inf. Comput Sci., 39(5), 835 (1999).

34. K. Roy, G. Ghosh, J. Chem. Inf. Comput. Sci., 44, 559 (2004).

35. L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci., 35, 1039 (1995).

Current number: