# 7th GSoC weekly report: Hilbert Schmidt Independence Criterion

Finally, I started on kernel based (in)dependence tests last week. These are tests that try to find out whether for two random variables $\textbf{x},\textbf{y}$ are independent, i.e. whether their joint distribution factorises into the individual ones. The null hypothesis (that may be rejected) is $H_0:P_\textbf{x}P_\textbf{y}=P_{\textbf{x}\textbf{y}}$

These kind of tests basically work like two-sample tests: Given one set of samples from each random variable
$Z=(X,Y)=\{(x_1,y_1,…,(x_m,y_m)\}$
a test statistic is computed and then compared against the distribution of the statistic under the null-hypothesis. If the position is in an upper part of it, the null-hypothesis is rejected since it is unlikely that the current value was generated by it.

The class of independence tests I will implement for my project are all based on the Hilbert Schmidt independence criterion (HSIC), which takes out the above procedure to an reproducing kernel Hilbert space (RKHS). The (biased version of the) HSIC statistic itself is simply given by
$\text{HSIC}_b(Z)=\frac{1}{m^2}\text{trace}(KHLH)$
where $K,L$ are kernel matrices of the input samples $X,Y$ in some RKHS and $H=I-\frac{1}{m}\textbf{1}\textbf{1}^T$ is a centring matrix.

I integrated a general modular framework for independence tests into SHOGUN. The HSIC class is the first kernel-independence test that works. Interfaces are very similar to the two-sample test, however, they are not quite the same for various reasons. That’s why there is another class for independence testing next to the one for two-sample testing.

As for the two-sample tests, the null-distribution may simply be approximated by bootstrapping, i.e. merging the samples and computing the statistic for many times. This is now possible for any independence test. Another method to approximate the null-distribution for HSIC is fitting a Gamma distribution [1] as

$m\text{HSIC}=\frac{x^{\alpha-1}\exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}$ where
$\alpha=\frac{(\textbf{E}(\text{HSIC}_b(Z)))^2}{\text{var}(\text{HSIC}_b(Z))} \quad \text{and}\quad\beta=\frac{m\text{var}(\text{HSIC}_b(Z))}{\textbf{E}(\text{HSIC}_b(Z))}$

It’s also already implemented! There are already modular interfaces for the new classes and some simple tests. I will extend these during this weak. Time passes fast: The mid-term-evaluation is this week already. I pretty much enjoyed the first half ๐

[1]: Gretton, A., Fukumizu, K., Teo, C., & Song, L. (2008). A kernel statistical test of independence.

# 6th GSoC weekly report: First modular examples and other stuff

Last week’s changes were all rather subtle:

• I created some first modular examples in python,
• fixed this big bug in the model selection trees I talked about last week (nasty!),
• added some convenience methods for the two-sample-test constructors (there is now a new method in CFeatures to append feature objects)
• and corrected a bunch of bugs on the fly.

This week, I will do some more work on the examples and then start working on independence testing.

# 5th GSoC weekly report: Almost finished MMD tests

Last week, I have been working silently (no Internet connection) on finishing all MMD related tests. What is new is a distinction between biased/unbiased test statistics for the quadratic MMD, and the Gaussian approximation of the null-distribution for the linear time MMD – and more tests.

Since the linear time MMD, defined as
$\text{MMD}_l^2=\frac{1}{m}\sum_{i=1}^{m}h((x_{2i-1},y_{2i-1}),(x_{2i},y_{2i}))$
where
$h((x_{i},y_{i}),((x_{j},y_{j}))=k(x_{i},x_{i})+k(y_{i},y_{i})-k(x_{i},y_{j})-k(x_{j},y_{i})$
is normally distributed both in null- and alternative distribution, one can easily compute test-threshold and p-values using the variance of the h-values of the above term as a proxy for the real variance (The null-distribution has zero mean). For large sample sizes, this is an accurate and very cheap way to perform the test: The statistic has to be computed only once whereas bootstrapping would need a few hundred iterations. Another reason why the linear time MMD is well suited for large scale problems.

I also started on integrating my code into the modular interfaces of SHOGUN and will produce some first python examples next week.

Jacob (Gaussian Process Regression project), who uses and extends parts of my model selection code from last years GSoC has found a serious problem in SHOGUN’s parameter trees for model selection. I hope to fix it this week – complicated.

When all mentioned things are done, I might start with dependence testing next week.

# 4th GSoC weekly report: Tests, tests, tests…

Since all threshold methods that I implemented so far are quite unintuitive and hard to verify, I spent last week writing tests for most of their parts. I created test-cases based on fixed and random datasets (mainly taken from the papers where the methods were introduced), in which results are compared against MATLAB implementations of the same problem. For most cases, the difference is asserted to be smaller than 10E-15.

Although it takes a lot of time and effort, I believe these automatic tests should come with every more complicated method in SHOGUN. I remember quite some cases where I spent an hour on finding an error that would have been easily caught by a test.

I hope to finish all all statistical tests based on the quadratic MMD this week. I still need to implement the biased version (which can be used along with the threshold methods) and a possibility for choosing the type of statistic to use. I also plan to implement the linear time MMD and corresponding tests (Gaussian approximation of null-distribution). I won’t have Internet access this week since I am in nowhere in Poland — so expect a larger patch at the end of the week.

# 3rd GSoC weekly report: New threshold methods for quadratic MMD

I finally finished my exams last week and can now work full-time for GSoC and my Master project. Puh! ๐

Last week, I implemented two methods to compute a threshold for the quadratic time MMD test:

1. A test based on the Eigenspectrum of the kernel matrix of the joint samples. This is a nice and computationally efficient idea from [1]. The basic idea is that for the case $P=Q$, the biased estimate of the null-distribution converges in distribution:
$m\text{MMD}^2_b \rightarrow \sum_{l=1}^\infty \lambda_l z_l^2$
where $z_l \sim \mathcal{N}(0,2)$ i.i.d. andย  $\lambda_l$ are Eigenvalues of which the empirical estimates $\hat{\lambda}_l$ are given by the Eigenvalues of the centred kernel matrix $\tilde{K}=HKH$ where $K_{ij}=k(x_i,x_j)$. Its possible to sample the null-distribution using these estimates and to compute a p-value or threshold using the resulting samples.
2. A heuristic method, also from [1],ย  that approximates the null-distribution with a gamma-distribution where the first two moments are matched. I.e.
$m\text{MMD}_b(Z) \sim \frac{x^{\alpha-1}\exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}$ where $\alpha=\frac{(\textbf{E}(\text{MMD}_b(Z)))^2}{\text{var}(\text{MMD}_b(Z))}\quad \text{and}\quad \beta=\frac{m\text{var}(\text{MMD}_b(Z))}{(\textbf{E}(\text{MMD}_b(Z)))^2}$

Both methods need some some distribution function ($\Gamma$, etc), which I integrated from ALGLIB. I added some tests to ensure that results equal to these obtained with MATLAB. Along with that come some SGVector based wrappers for SHOGUN functions (sorting, eigenvalues, etc).

Next week, I will do some fine-tuning on the implemented methods and then create tests which illustrate all methods.

[1]: Gretton, A., Fukumizu, K., & Harchaoui, Z. (2011). A fast, consistent kernel two-sample test.

# 2nd GSoC weekly report

Last week, I was again very busy with exams and doing experiments for a NIPS submission.

The latter is somehow related to my GSoC project, and I will implement it once the other stuff is done:
We developed a method for selecting optimal coefficients of a non-negative combination of kernels for the linear time (=large scale) MMD-two-sample test. The criterion that is optimised for is the ratio of the linear MMD $\eta_k$ by its standard deviation $\sigma_k$, i.e.
$k_*=\arg \sup_{k\in\mathcal{K}} \eta_k \sigma_k^{-1}$. That is equivalent to solving the quadratic program
$\min \{ \beta^T\hat{Q}\beta : \beta^T \hat{\eta}=1, \beta\succeq0\}$
where the combination of kernels is given by
$\mathcal{K}:=\{k : k=\sum_{u=1}^d\beta_uk_u,\sum_{u=1}^d\beta_u\leq D,\beta_u\geq0, \forall u\in\{1,…,d\}\}$
$\hat{Q}$ is a linear time estimate of the covariance of the MMD estimates and $\hat{\eta}$ is a linear time estimate of the MMD using the above kernel combinations.

Apart from that, I implemented a method to approximate the null-distribution of the quadratic time MMD, which is based on the Eigenspectrum of the kernel matrix of the merged samples from the two distributions, based on [1]. It still needs to be compared against the MATLAB implementation. It comes with some minor helper functions around matrix algebra.

This week, I will finally have my last exam and then continue on the advanced methods for computing test thresholds.

[1]: Gretton, A., Fukumizu, K., & Harchaoui, Z. (2011). A fast, consistent kernel two-sample test.

# First GSoC weekly report

I am currently quite busy with my exams, however, the last three will be done soon and I still managed to do initial sketches for the statistical testing framework along with helping on solving problems that occurred because of the massive changes that are currently happening to SHOGUN’s label and multi-class system.

Here you can find an UML diagram of the class structure so far. I implemented first simple kernel-two-sample-tests — the ones based on the linear and the quadratic time MMD metric. For computing a p-value, these two may approximate their null-distribution using a (brute-force) bootstrapping approach based on shuffling data of the two underlying distributions and then computing the statistic multiple times. The bootstrapping code will work for any two-sample based test.

Next steps are: Advanced methods for estimating Null-distributions for the MMD tests.

I also worked with Arthur (mentor) on a version of the MMD that is related to my Master project: A convex combination of (arbritary) kernels for the linear time MMD where the optimal weights are learned by solving a quadratic program. I might implement that into SHOGUN as well. (Who can help me how to interface the QP-solver of SHOGUN?)

# GSoC 2012

I participated the GSoC 2012 for SHOGUN! The project I worked on was closely related to my Master’s project at UCL. It is about kernel based statistical tests. My host ist Arthur Gretton, lecturer with the Gatsby Computational Neuroscience Unit, part of the Centre for Computational Statistics and Machine Learning at UCL, who I met there during my studies.
Abstract: Statistical tests for dependence or difference are an important tool in data-analysis. However, when data is high-dimensional or in non-numerical form (strings, graphs), classical methods fail. This project implements recently developed kernel-based generalizations of statistical tests, which overcome this issue. The kernel-two-sample test based on the Maximum-Mean-Discrepancy (MMD) tests whether two sets of samples are from the same or from different distributions. Related to the kernel-two-sample test is the Hilbert-Schmidt-Independence criterion (HSIC), which tests for statistical dependence between two sets of samples. Multiple tests based on the MMD and the HSIC are implemented along with a general framework for statistical tests in SHOGUN.

My proposal can be found here. SHOGUN got 8 student slots, compared to 5 in 2011, so this summer was a major boost in SHOGUN development. Check out the cool others’ students projects here.

# Accepted!

Yeah! I just got accepted into the GSoC 2012 for SHOGUN! The project I will work on this year is closely related to my Master’s project at UCL. It is about kernel based statistical tests. My host ist Arthur Gretton, lecturer with the Gatsby Computational Neuroscience Unit, part of the Centre for Computational Statistics and Machine Learning at UCL, who I met there during my studies.
Abstract: Statistical tests for dependence or difference are an important tool in data-analysis. However, when data is high-dimensional or in non-numerical form (strings, graphs), classical methods fail. This project implements recently developed kernel-based generalizations of statistical tests, which overcome this issue. The kernel-two-sample test based on the Maximum-Mean-Discrepancy (MMD) tests whether two sets of samples are from the same or from different distributions. Related to the kernel-two-sample test is the Hilbert-Schmidt-Independence criterion (HSIC), which tests for statistical dependence between two sets of samples. Multiple tests based on the MMD and the HSIC are implemented along with a general framework for statistical tests in SHOGUN.

My proposal can be found here. I am looking extremely forward to this. This year, SHOGUN got 8 student slots, compared to 5 last year, so this summer will probably bring a major boost in SHOGUN development. Check out the cool others’ students projects here.