Apache Mahout Hackathon Berlin
Posted: | More posts about Mahout Hacking
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.
As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)
We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:
- How to use Apache Mahout collaborative filtering with complex models.
- How to use Apache Mahout via a web application?
- How to use classification (mostly focussed on using Naive Bayes from within web applications).
- Is HBase a solution for scalable graph mining algorithms?
- Is there a frequent itemset algorithm that respects temporal changes in patterns?
Those more into Mahout development proposed a slightly different set of topics:
- PLSI and Map/Reduce?
- Build customisable sampling strategies for distributed recommendations.
- Come up with a more Java API friendly configuration scheme for Mahout clusterings.
- Complete the distributed SVD recommender.
Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.
Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.
In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.