# Herr Strathmann.

## 3rd GSoC weekly report: New threshold methods for quadratic MMD

I finally finished my exams last week and can now work full-time for GSoC and my Master project. Puh! :)

Last week, I implemented two methods to compute a threshold for the quadratic time MMD test:

1. A test based on the Eigenspectrum of the kernel matrix of the joint samples. This is a nice and computationally efficient idea from [1]. The basic idea is that for the case $P=Q$, the biased estimate of the null-distribution converges in distribution:
$m\text{MMD}^2_b \rightarrow \sum_{l=1}^\infty \lambda_l z_l^2$
where $z_l \sim \mathcal{N}(0,2)$ i.i.d. and  $\lambda_l$ are Eigenvalues of which the empirical estimates $\hat{\lambda}_l$ are given by the Eigenvalues of the centred kernel matrix $\tilde{K}=HKH$ where $K_{ij}=k(x_i,x_j)$. Its possible to sample the null-distribution using these estimates and to compute a p-value or threshold using the resulting samples.

2. A heuristic method, also from [1],  that approximates the null-distribution with a gamma-distribution where the first two moments are matched. I.e.
$m\text{MMD}_b(Z) \sim \frac{x^{\alpha-1}\exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}$ where $\alpha=\frac{(\textbf{E}(\text{MMD}_b(Z)))^2}{\text{var}(\text{MMD}_b(Z))}\quad \text{and}\quad \beta=\frac{m\text{var}(\text{MMD}_b(Z))}{(\textbf{E}(\text{MMD}_b(Z)))^2}$

Both methods need some some distribution function ($\Gamma$, etc), which I integrated from ALGLIB. I added some tests to ensure that results equal to these obtained with MATLAB. Along with that come some SGVector based wrappers for SHOGUN functions (sorting, eigenvalues, etc).

Next week, I will do some fine-tuning on the implemented methods and then create tests which illustrate all methods.

[1]: Gretton, A., Fukumizu, K., & Harchaoui, Z. (2011). A fast, consistent kernel two-sample test.