Herr Strathmann.

GSoC 2013

Shogun got accepted in the Google Summer of Code 2013!

To read my blog about the GSoC, click here.

Check out our ideas pageThis year, I will be a mentor rather than a student  and I am very excited about this.
I'll be offering two projects:

  • Implement Gaussian process classification (joint with Oliver Stegle). This is an extension of the GSoC project last year and should be quite interested while not being too complicated (link)
  • Implement unbiased estimators of likelihoods of very large, sparse Gaussian distributions (joint with Erlend Aune and Daniel Simpson). This one is quite challenging since it involved many different topics. However, it should also be very interesting (link)

SHOGUN - A large scale machine learning toolbox

shogun logoTo read my blog about SHOGUN development, click here.

SHOGUN (website) is a machine learning toolbox with focus is on large scale kernel methods and especially on Support Vector Machines. It provides a generic SVM interface for several different SVM state-of-the-art implementations

Each of the SVMs can be combined with a variety of kernels. The toolbox provides efficient implementations of many common kernels.

Also many other popular machine learning algorithms are implemented and the list is continuously extended for example due to the support of the Google Summer of Code. For example, there are now Gaussian processes, many dimensionality reduction methods, Structured Output and latent SVMs, various multi-task learning techniques, and many more.

SHOGUN is implemented in C++ and comes with interfaces to many languages.

I got into the team after the GSoC 2011 and since then have implemented some new features: A framework for cross-validation and model selection during the GSoC 2011 and a framework for kernel based statistical hypothesis testing in the GSoC 2012. I also worked on migrating serialized SHOGUN objects from different versions to one another.

 

 

GSoC 2012

gsoc

To read my blog about my participation in the GSoc 2012, click here.

I participated the GSoC 2012 for SHOGUN! The project I worked on was closely related to my Master's project at UCL. It is about kernel based statistical tests. My host ist Arthur Gretton, lecturer with the Gatsby Computational Neuroscience Unit, part of the Centre for Computational Statistics and Machine Learning at UCL, who I met there during my studies.


Abstract: Statistical tests for dependence or difference are an important tool in data-analysis. However, when data is high-dimensional or in non-numerical form (strings, graphs), classical methods fail. This project implements recently developed kernel-based generalizations of statistical tests, which overcome this issue. The kernel-two-sample test based on the Maximum-Mean-Discrepancy (MMD) tests whether two sets of samples are from the same or from different distributions. Related to the kernel-two-sample test is the Hilbert-Schmidt-Independence criterion (HSIC), which tests for statistical dependence between two sets of samples. Multiple tests based on the MMD and the HSIC are implemented along with a general framework for statistical tests in SHOGUN.

My proposal can be found here. SHOGUN got 8 student slots, compared to 5 in 2011, so this summer was a major boost in SHOGUN development. Check out the cool others' students projects here.

 

GSoC 2011

I participated in the GSoC 2011 for the SHOGUN machine learning toolbox (link). This was awesome! The program brings together students (like me) and open-source organisations. You are getting paid to work full-time on a project you choose. I could really use the money and learned lots of lots of cool things and met nice people.

My project was mentored by Soeren Sonnenburg and had the title "Built a flexible cross-validation framework into shogun". Here is the abstract:
Nearly every learning machine has parameters which have to be determined manually. Shogun currently lacks a model selection framework. Therefore, the goal of this project is to extend shogun to make cross-validation possible. Different strategies, how training data is split up should be available and easy to exchange. Various model selection schemes are integrated (train,validation,test split, n-fold cross validation, leave one out cross validation, etc)


The proposal I wrote can be found here. My motivation for the project came from the fact that I actually used SHOGUN for my Bachelor thesis (link). I had to do model-selection by hand these days. A major portion of the programming work I did would not have been necessary if model selection already was a part of SHOGUN. Nowadays, quite some people use the stuff I wrote during the summer 2011.