Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results

Silberzahn, R.; Uhlmann, E. L.; Martin, D. P.; Anselmi, P.; Aust, F.; Awtrey, E.; Bahník, Š.; Bai, F.; Bannard, C.; Bonnier, E.; Carlsson, R.; Cheung, F.; Christensen, G.; Clay, R.; Craig, M. A.; Dalla Rosa, A.; Dam, L.; Evans, M. H.; Flores Cervantes, I.; Fong, N.; Gamez-Djokic, M.; Glenz, A.; Gordon-McKeon, S.; Heaton, T. J.; Hederos, K.; Heene, M.; Hofelich Mohr, A. J.; Högden, F.; Hui, K.; Johannesson, M.; Kalodimos, J.; Kaszubowski, E.; Kennedy, D. M.; Lei, R.; Lindsay, T. A.; Liverani, S.; Madan, C. R.; Molden, D.; Molleman, E.; Morey, R. D.; Mulder, L. B.; Nijstad, B. R.; Pope, N. G.; Pope, B.; Prenoveau, J. M.; Rink, F.; Robusto, E.; Roderique, H.; Sandberg, A.; Schlüter, E.; Schönbrodt, F. D.; Sherman, M. F.; Sommer, S. A.; Sotak, K.; Spain, S.; Spörlein, C.; Stafford, T.; Stefanutti, L.; Tauber, S.; Ullrich, J.; Vianello, M.; Wagenmakers, E. -J.; Witkowiak, M.; Yoon, S.; Nosek, B. A.

doi:10.1177/2515245917747646

Twenty-nine teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. Analytic approaches varied widely across the teams, and the estimated effect sizes ranged from 0.89 to 2.93 (Mdn = 1.31) in odds-ratio units. Twenty teams (69%) found a statistically significant positive effect, and 9 teams (31%) did not observe a significant relationship. Overall, the 29 different analyses used 21 unique combinations of covariates. Neither analysts’ prior beliefs about the effect of interest nor their level of expertise readily explained the variation in the outcomes of the analyses. Peer ratings of the quality of the analyses also did not account for the variability. These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions. Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results.