A novel approach based on supervised machine -learning is proposed to predict the solubility of drugs and druglike molecules in mixtures of organic solvents. Similar to quantitative structure -property relationship (QSPR) models, different solvent types are identified by molecular descriptors, which, in this study, are considered as UNIFAC subgroups. To overcome the potential lack of UNIFAC subgroups for the complex Active Pharmaceutical Ingredients (APIs) currently developed in the pharmaceutical industry, the API molecule is considered as a unique entity in the proposed modelling approach. Therefore, API solubility is predicted as a function of temperature, functional subgroups of the solvents and composition of the solvent mixture; in turn, regressors ' correlation is handled through Partial Least -Squares (PLS) regression. The method is developed and tested with experimental data of a real API and 14 organic solvents that are industrially employed for crystallisation. Solubility predictions are accurate and precise for single solvents, binary mixtures and ternary mixtures of organic solvents at different compositions and temperatures, with a determination coefficient R 2 >= 0.90. To further test the applicability of the model, the proposed approach is applied to 9 literature organic solubility datasets of drugs and drug -like compounds and compared to benchmark solubility models in the literature. Results show that the proposed approach provides satisfactory predictions: the majority of validation and calibration data have R 2 = 0.95 -0.99; the ratio between RMSE (root mean squared error) of the proposed method and the range of measured solubility values is from 1 to 3 orders of magnitude smaller than the RMSE ratio obtained by the benchmark models.

Predicting drug solubility in organic solvents mixtures: A machine-learning approach supported by high-throughput experimentation

Barolo M.;Bezzo F.;Facco P.
2024

Abstract

A novel approach based on supervised machine -learning is proposed to predict the solubility of drugs and druglike molecules in mixtures of organic solvents. Similar to quantitative structure -property relationship (QSPR) models, different solvent types are identified by molecular descriptors, which, in this study, are considered as UNIFAC subgroups. To overcome the potential lack of UNIFAC subgroups for the complex Active Pharmaceutical Ingredients (APIs) currently developed in the pharmaceutical industry, the API molecule is considered as a unique entity in the proposed modelling approach. Therefore, API solubility is predicted as a function of temperature, functional subgroups of the solvents and composition of the solvent mixture; in turn, regressors ' correlation is handled through Partial Least -Squares (PLS) regression. The method is developed and tested with experimental data of a real API and 14 organic solvents that are industrially employed for crystallisation. Solubility predictions are accurate and precise for single solvents, binary mixtures and ternary mixtures of organic solvents at different compositions and temperatures, with a determination coefficient R 2 >= 0.90. To further test the applicability of the model, the proposed approach is applied to 9 literature organic solubility datasets of drugs and drug -like compounds and compared to benchmark solubility models in the literature. Results show that the proposed approach provides satisfactory predictions: the majority of validation and calibration data have R 2 = 0.95 -0.99; the ratio between RMSE (root mean squared error) of the proposed method and the range of measured solubility values is from 1 to 3 orders of magnitude smaller than the RMSE ratio obtained by the benchmark models.
File in questo prodotto:
File Dimensione Formato  
CenciEtAl_IntJPharm2024.pdf

accesso aperto

Tipologia: Published (publisher's version)
Licenza: Creative commons
Dimensione 2.93 MB
Formato Adobe PDF
2.93 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3525948
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
  • OpenAlex ND
social impact