Herr Strathmann.

5th GSoC weekly report: Almost finished MMD tests

Last week, I have been working silently (no Internet connection) on finishing all MMD related tests. What is new is a distinction between biased/unbiased test statistics for the quadratic MMD, and the Gaussian approximation of the null-distribution for the linear time MMD - and more tests.

Since the linear time MMD, defined as
\[\text{MMD}_l^2=\frac{1}{m}\sum_{i=1}^{m}h((x_{2i-1},y_{2i-1}),(x_{2i},y_{2i}))\]
where
\[h((x_{i},y_{i}),((x_{j},y_{j}))=k(x_{i},x_{i})+k(y_{i},y_{i})-k(x_{i},y_{j})-k(x_{j},y_{i})\]
is normally distributed both in null- and alternative distribution, one can easily compute test-threshold and p-values using the variance of the h-values of the above term as a proxy for the real variance (The null-distribution has zero mean). For large sample sizes, this is an accurate and very cheap way to perform the test: The statistic has to be computed only once whereas bootstrapping would need a few hundred iterations. Another reason why the linear time MMD is well suited for large scale problems.

I also started on integrating my code into the modular interfaces of SHOGUN and will produce some first python examples next week.

Jacob (Gaussian Process Regression project), who uses and extends parts of my model selection code from last years GSoC has found a serious problem in SHOGUN's parameter trees for model selection. I hope to fix it this week - complicated.

When all mentioned things are done, I might start with dependence testing next week.