Accepted!

gsoc

Yeah! I just got accepted into the GSoC 2012 for SHOGUN! The project I will work on this year is closely related to my Master’s project at UCL. It is about kernel based statistical tests. My host ist Arthur Gretton, lecturer with the Gatsby Computational Neuroscience Unit, part of the Centre for Computational Statistics and Machine Learning at UCL, who I met there during my studies.
Abstract: Statistical tests for dependence or difference are an important tool in data-analysis. However, when data is high-dimensional or in non-numerical form (strings, graphs), classical methods fail. This project implements recently developed kernel-based generalizations of statistical tests, which overcome this issue. The kernel-two-sample test based on the Maximum-Mean-Discrepancy (MMD) tests whether two sets of samples are from the same or from different distributions. Related to the kernel-two-sample test is the Hilbert-Schmidt-Independence criterion (HSIC), which tests for statistical dependence between two sets of samples. Multiple tests based on the MMD and the HSIC are implemented along with a general framework for statistical tests in SHOGUN.

My proposal can be found here. I am looking extremely forward to this. This year, SHOGUN got 8 student slots, compared to 5 last year, so this summer will probably bring a major boost in SHOGUN development. Check out the cool others’ students projects here.

 

GSoC 2011

I participated in the GSoC 2011 for the SHOGUN machine learning toolbox (link). This was awesome! The program brings together students (like me) and open-source organisations. You are getting paid to work full-time on a project you choose. I could really use the money and learned lots of lots of cool things and met nice people.

My project was mentored by Soeren Sonnenburg and had the title “Built a flexible cross-validation framework into shogun”. Here is the abstract:
Nearly every learning machine has parameters which have to be determined manually. Shogun currently lacks a model selection framework. Therefore, the goal of this project is to extend shogun to make cross-validation possible. Different strategies, how training data is split up should be available and easy to exchange. Various model selection schemes are integrated (train,validation,test split, n-fold cross validation, leave one out cross validation, etc)


The proposal I wrote can be found here. My motivation for the project came from the fact that I actually used SHOGUN for my Bachelor thesis (link). I had to do model-selection by hand these days. A major portion of the programming work I did would not have been necessary if model selection already was a part of SHOGUN. Nowadays, quite some people use the stuff I wrote during the summer 2011.