OBJECTIVE: To evaluate to what extent an inefficient statistical model affects the study of genetic factors in extra-intestinal manifestations of Crohn's disease (CD) and how clinical predictions can be improved using more adequate techniques. MATERIALS: Extra-intestinal manifestations were studied in 152 CD patients. Three sets of variables were considered: (1) disease characteristics--presentation, behavior, location; (2) generic risk factors--age, gender, smoke and familiarity; and (3) genetic polymorphisms of the NOD2, CD14, TNF, IL12B, and IL1RN genes, whose involvement in CD is known or suspected. METHODS: Six statistical classifiers and data mining models were applied: (1) logistic regression as a benchmark; (2) generalized additive model; (3) projection pursuit regression; (4) linear discriminant analysis, (5) quadratic discriminant analysis; (6) artificial neural networks one-layer feed forward. Models were selected using the Akaike Information criterion and their accuracy was compared with several indexes. RESULTS: Extra-intestinal manifestations occurred in 75 patients. The model with clinical variables only selected familiarity, gender, presentation, and behavior as significantly associated with extra-intestinal manifestations, whereas when the genetic factors were also included familiarity was no longer significant, being replaced by the NOD2, TNF, and IL12B single nucleotide polymorphisms. The projection pursuit regression performed best in predicting individual outcomes (Kappa statistics 0.078 [SE 0.09] without and 0.108 [SE 0.075] with genetic information). One-layer artificial neural networks did not show any particular improvement in terms of model accuracy over nonlinear techniques. CONCLUSIONS: The correct identification of factors associated with extra-intestinal symptoms in CD, in particular the genetic ones, is highly dependent on the model chosen for the analysis. By using the most sophisticated statistical models, the accuracy of prediction can be strengthened by 10-64%, compared with linear regression.

Modeling the role of genetic factors in characterizing extra-intestinal manifestations in Crohn's disease patients: does this improve outcome predictions?

GREGORI, DARIO;
2007

Abstract

OBJECTIVE: To evaluate to what extent an inefficient statistical model affects the study of genetic factors in extra-intestinal manifestations of Crohn's disease (CD) and how clinical predictions can be improved using more adequate techniques. MATERIALS: Extra-intestinal manifestations were studied in 152 CD patients. Three sets of variables were considered: (1) disease characteristics--presentation, behavior, location; (2) generic risk factors--age, gender, smoke and familiarity; and (3) genetic polymorphisms of the NOD2, CD14, TNF, IL12B, and IL1RN genes, whose involvement in CD is known or suspected. METHODS: Six statistical classifiers and data mining models were applied: (1) logistic regression as a benchmark; (2) generalized additive model; (3) projection pursuit regression; (4) linear discriminant analysis, (5) quadratic discriminant analysis; (6) artificial neural networks one-layer feed forward. Models were selected using the Akaike Information criterion and their accuracy was compared with several indexes. RESULTS: Extra-intestinal manifestations occurred in 75 patients. The model with clinical variables only selected familiarity, gender, presentation, and behavior as significantly associated with extra-intestinal manifestations, whereas when the genetic factors were also included familiarity was no longer significant, being replaced by the NOD2, TNF, and IL12B single nucleotide polymorphisms. The projection pursuit regression performed best in predicting individual outcomes (Kappa statistics 0.078 [SE 0.09] without and 0.108 [SE 0.075] with genetic information). One-layer artificial neural networks did not show any particular improvement in terms of model accuracy over nonlinear techniques. CONCLUSIONS: The correct identification of factors associated with extra-intestinal symptoms in CD, in particular the genetic ones, is highly dependent on the model chosen for the analysis. By using the most sophisticated statistical models, the accuracy of prediction can be strengthened by 10-64%, compared with linear regression.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2447570
Citazioni
  • ???jsp.display-item.citation.pmc??? 6
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact