I finally finished my exams last week and can now work full-time for GSoC and my Master project. Puh! 🙂
Last week, I implemented two methods to compute a threshold for the quadratic time MMD test:
- A test based on the Eigenspectrum of the kernel matrix of the joint samples. This is a nice and computationally efficient idea from [1]. The basic idea is that for the case \(P=Q\), the biased estimate of the null-distribution converges in distribution:
\[ m\text{MMD}^2_b \rightarrow \sum_{l=1}^\infty \lambda_l z_l^2\]
where \(z_l \sim \mathcal{N}(0,2)\) i.i.d. and \(\lambda_l\) are Eigenvalues of which the empirical estimates \(\hat{\lambda}_l\) are given by the Eigenvalues of the centred kernel matrix \(\tilde{K}=HKH\) where \(K_{ij}=k(x_i,x_j)\). Its possible to sample the null-distribution using these estimates and to compute a p-value or threshold using the resulting samples. - A heuristic method, also from [1], that approximates the null-distribution with a gamma-distribution where the first two moments are matched. I.e.
\[m\text{MMD}_b(Z) \sim \frac{x^{\alpha-1}\exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}\] where \[ \alpha=\frac{(\textbf{E}(\text{MMD}_b(Z)))^2}{\text{var}(\text{MMD}_b(Z))}\quad \text{and}\quad \beta=\frac{m\text{var}(\text{MMD}_b(Z))}{(\textbf{E}(\text{MMD}_b(Z)))^2}\]
Both methods need some some distribution function (\(\Gamma\), etc), which I integrated from ALGLIB. I added some tests to ensure that results equal to these obtained with MATLAB. Along with that come some SGVector based wrappers for SHOGUN functions (sorting, eigenvalues, etc).
Next week, I will do some fine-tuning on the implemented methods and then create tests which illustrate all methods.
[1]: Gretton, A., Fukumizu, K., & Harchaoui, Z. (2011). A fast, consistent kernel two-sample test.