The problem of combining P-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent P-values (for the same hypothesis) into a single P-value. We show that essentially all these existing rules can be strictly improved when the P-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well-known rules like “twice the median” and “twice the average,” as well as geometric and harmonic means. Exchangeable Pvalues are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined P-values stabilize. Our work also improves rules for combining arbitrarily dependent P-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the P-values to e-values (using an α-dependent calibrator), averaging those e-values, converting to a level-α test using Markov’s inequality, and finally obtaining P-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov’s inequality.
Combining exchangeable P -values
Matteo Gasparin;
2025
Abstract
The problem of combining P-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent P-values (for the same hypothesis) into a single P-value. We show that essentially all these existing rules can be strictly improved when the P-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well-known rules like “twice the median” and “twice the average,” as well as geometric and harmonic means. Exchangeable Pvalues are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined P-values stabilize. Our work also improves rules for combining arbitrarily dependent P-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the P-values to e-values (using an α-dependent calibrator), averaging those e-values, converting to a level-α test using Markov’s inequality, and finally obtaining P-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov’s inequality.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




