Binary classification algorithms are often used in situations when one of the two classes is extremely rare. A common practice is to oversample units of the rare class when forming the training set. For some classification algorithms, like logistic classification, there are thorethical results that justify such an approach. Similar results are not available for other popular classification algorithms like classification trees. In this paper the use of balanced datasets, when dealing with rare classes, for tree classifiers and boosting algorithms is discussed and results from analyzing a real dataset and a simulated dataset are reported.

Selecting the training set in classification problems with rare events

SCARPA, BRUNO;TORELLI, NICOLA
2004

Abstract

Binary classification algorithms are often used in situations when one of the two classes is extremely rare. A common practice is to oversample units of the rare class when forming the training set. For some classification algorithms, like logistic classification, there are thorethical results that justify such an approach. Similar results are not available for other popular classification algorithms like classification trees. In this paper the use of balanced datasets, when dealing with rare classes, for tree classifiers and boosting algorithms is discussed and results from analyzing a real dataset and a simulated dataset are reported.
2004
New Developments in Classification and Data Analysis
9783540238096
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/188716
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 2
social impact