Herr Strathmann.

3rd GSoC weekly report: New threshold methods for quadratic MMD

I finally finished my exams last week and can now work full-time for GSoC and my Master project. Puh! :)

Last week, I implemented two methods to compute a threshold for the quadratic time MMD test:

  1. A test based on the Eigenspectrum of the kernel matrix of the joint samples. This is a nice and computationally efficient idea from [1]. The basic idea is that for the case \(P=Q\), the biased estimate of the null-distribution converges in distribution:
    \[ m\text{MMD}^2_b \rightarrow \sum_{l=1}^\infty \lambda_l z_l^2\]
    where \(z_l \sim \mathcal{N}(0,2)\) i.i.d. and  \(\lambda_l\) are Eigenvalues of which the empirical estimates \(\hat{\lambda}_l\) are given by the Eigenvalues of the centred kernel matrix \(\tilde{K}=HKH\) where \(K_{ij}=k(x_i,x_j)\). Its possible to sample the null-distribution using these estimates and to compute a p-value or threshold using the resulting samples.

  2. A heuristic method, also from [1],  that approximates the null-distribution with a gamma-distribution where the first two moments are matched. I.e.
    \[m\text{MMD}_b(Z) \sim \frac{x^{\alpha-1}\exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}\] where \[ \alpha=\frac{(\textbf{E}(\text{MMD}_b(Z)))^2}{\text{var}(\text{MMD}_b(Z))}\quad \text{and}\quad \beta=\frac{m\text{var}(\text{MMD}_b(Z))}{(\textbf{E}(\text{MMD}_b(Z)))^2}\]

Both methods need some some distribution function (\(\Gamma\), etc), which I integrated from ALGLIB. I added some tests to ensure that results equal to these obtained with MATLAB. Along with that come some SGVector based wrappers for SHOGUN functions (sorting, eigenvalues, etc).

Next week, I will do some fine-tuning on the implemented methods and then create tests which illustrate all methods.

[1]: Gretton, A., Fukumizu, K., & Harchaoui, Z. (2011). A fast, consistent kernel two-sample test.