Weighted Kolmogorov-Smirnov Two-Sample Test with threshold

ks_test(x, y, thresh = 0.05, w_x = rep(1, length(x)), w_y = rep(1, length(y)))

Arguments

x

Vector of values sampled from the first distribution

y

Vector of values sampled from the second distribution

thresh

The threshold needed to clear between the two cumulative distributions

w_x

The observation weights for x

w_y

The observation weights for y

Value

A list with class "htest" containing the following components:

  • statistic the value of the test statistic.

  • p.value the p-value of the test.

  • alternative a character string describing the alternative hypothesis.

  • method a character string indicating what type of test was performed.

  • data.name a character string giving the name(s) of the data.

Details

The usual Kolmogorov-Smirnov test for two vectors X and Y, of size m and n rely on the empirical cdfs \(E_x\) and \(E_y\) and the test statistic $$D = sup_{t\in (X, Y)} |E_x(x) - E_y(x))$$. This modified Kolmogorov-Smirnov test relies on two modifications.

  • Using observation weights for both vectors X and Y: Those weights are used in two places, while modifying the usual KS test. First, the empirical cdfs are updates to account for the weights. Secondly, the effective sample sizes are also modified. This is inspired from https://stackoverflow.com/a/55664242/13768995, using Monahan (2011).

  • Testing against a threshold: the test statistic is thresholded such that \(D = max(D - thresh, 0)\). Since \(0\le D\le 1\), the value of the threshold is also between 0 and 1, representing an effect size for the difference.

References

Monahan, J. (2011). Numerical Methods of Statistics (2nd ed., Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511977176

Examples

x <- runif(100) y <- runif(100, min = .5, max = .5) ks_test(x, y, thresh = .001)
#> #> Two-sample Weighted Kolmogorov-Smirnov test with threshold 0.001 #> #> data: x and y #> = 0.569, p-value = 1.743e-14 #> alternative hypothesis: two-sided #>