I am currently quite busy with my exams, however, the last three will be done soon and I still managed to do initial sketches for the statistical testing framework along with helping on solving problems that occurred because of the massive changes that are currently happening to SHOGUN’s label and multi-class system.
Here you can find an UML diagram of the class structure so far. I implemented first simple kernel-two-sample-tests — the ones based on the linear and the quadratic time MMD metric. For computing a p-value, these two may approximate their null-distribution using a (brute-force) bootstrapping approach based on shuffling data of the two underlying distributions and then computing the statistic multiple times. The bootstrapping code will work for any two-sample based test.
Next steps are: Advanced methods for estimating Null-distributions for the MMD tests.
I also worked with Arthur (mentor) on a version of the MMD that is related to my Master project: A convex combination of (arbritary) kernels for the linear time MMD where the optimal weights are learned by solving a quadratic program. I might implement that into SHOGUN as well. (Who can help me how to interface the QP-solver of SHOGUN?)