Maximum Mean Discrepancy Unbiased Test

mmd_test(
  x,
  y,
  kernel = "rbfdot",
  type = ifelse(min(nrow(x), nrow(y)) < 1000, "unbiased", "linear"),
  null = c("permutation", "exact"),
  iterations = 10^3,
  frac = 1,
  ...
)

Arguments

x

d-dimensional samples from the first distribution

y

d-dimensional samples from the first distribution

kernel

A character that must match a known kernel. See details.

type

Which statistic to use. One of 'unbiased' or 'linear'. See Gretton et al for details. Default to 'unbiased' if the two vectors are of length less than 1000 and to 'linear' otherwise.

null

How to asses the null distribution. This can only be set to exact if the type is 'unbiased' and the kernel is 'rbf'.

iterations

How many iterations to do to simulate the null distribution. Default to 10^4. Only used if null is 'permutations'

frac

For the linear statistic, how many points to sample. See details.

...

Further arguments passed to kernel functions

Value

A list containing the following components:

  • statistic the value of the test statistic.

  • p.value the p-value of the test.

Details

This computes the MMD^2u unbiased statistic or the MMDl linear statistic from Gretton et al. The code relies on the pairwise_kernel function from the python module sklearn. To list the available kernels, see the examples.

References

Gretton, A., Borgwardt, K., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A Kernel Two-Sample Test Journal of Machine Learning Research (2012)

Examples

x <- matrix(rnorm(1000, 0, 1), ncol = 10) y <- matrix(rnorm(1000, 0, 2), ncol = 10) mmd_test(x, y)
#> $statistic #> [1] 0.0002247507 #> #> $p.value #> [1] 0.001 #>
mmd_test(x, y, type = "linear")
#> $statistic #> [1] -0.001957033 #> #> $p.value #> [1] 0.569 #>
x <- matrix(rnorm(1000, 0, 1), ncol = 10) y <- matrix(rnorm(1000, 0, 1), ncol = 10) # Set iterations to small number for runtime # Increase for more accurate results mmd_test(x, y, iterations = 10^2)
#> $statistic #> [1] -8.727474e-06 #> #> $p.value #> [1] 0.53 #>