Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Apache Dinner Berlin

2011-02-28 06:06
Title: Apache Dinner Berlin
Location: Good Morning Vietnam X-Berg
Start Time: 19:00
Date: 2011-02-28


Please contact Simon Willnauer if you would like to attend for further information.

Note to self: svn:ignore usage

2011-02-25 20:47
Putting the information here to make retrieving it a bit easier next time.

When working with svn and some random IDE I'd really love to avoid checking in any files that are IDE specific (project configuration, classpath, etc.). The command to do that:

svn propedit svn:ignore $directory_to_edit


After issuing this command you'll be prompted to enter file patterns for files to ignore or the directory names.

More detailed information in the official documentation on svn:ignore.

Berlin Scrumtisch - February 2011

2011-02-24 21:33
The February Scrumtisch Berlin featured a talk by Lyssa Adkins, famously known for her publications on Coaching Agile teams. A mixture of fifty developers, scrum masters, coaches and product owner as well as one project manager followed Marion Eikmann's invitation. Thanks for organising the event, as well as thank you to Hypoport for providing the venue.

In her one-hour presentation she mainly focussed on two core topics: On the roles agile coaches have to fullfill as well as on the skill set needed by agile coaches. Being an agile coach (as well as being a scrum master, really), entails four very basic roles:

  • Being a bulldozer - that is getting impediments out of the way. That can be as easy (or hard) as getting appropriate equipment for your developers or teaming up with the product owner to communicate with anyone trying to break the sprint.
  • Being a servant leader - for a coach that means to work for the team, to ask the right questions and enable the team, to listen during dailies - instead of asking for additional reports: Most tools are already within the Scrum toolbox. The hard task is to identify them and use them correctly and efficiently.
  • Being a shepherd - that may be as easy as getting people to attend the dailies, or as complex as communicating common values, a common goal.
  • Being the guard of quality and performance - as a coach that means making degrading quality visible - and letting the Scrum team take the decision on how to deal with it.

However in reality each team is different, each sprint is different. So coaching really is similar to bing the river guide: Each trip is different. It is your task to make the team succeed and adapt to differing situations. To get to team to high performance - over and over again.

When talking about coaching what people need is a very specific skill set:

  • To become a successful agile coach it helps to have coaching skills to be able to understand the client, to see their impediments and help the team become better by listening and asking powerful questions.
  • Be a facilitator who organises sessions, meetings, conversations and may mediate in meetings.
  • Have business domain expertise - that is to know process designs, figure out a companies strategy options.
  • Be a great teacher to help people understand the basic concepts. That entails knowledge on course design, most likely design of exercises.
  • Have mentoring skills to guide a team in the process.
  • Know about lean and agile processes and stay up to date.
  • Have the technical skills on what makes development successful, know the extreme programming techniques to help team excel.
  • Have transformational skills - that is be familiar with change management, transformation leadership, knowing how to change organisations.


However being a coach remember that people are influenced not only by what you teach - but mostly by what you do. Most of what being a good coach means is invisible to the uninitiated outsider. It's about living the process you teach, staying true to the principles you want others to implement.

To get there it helps to get inspired, to talk with others in local meetups (just as with any coding practice: Try to find a common forum to exchange ideas and get fresh input). It may help to keep a value journal - not only to track your accomplishments and be able to prove them to higher management, but also to track where to improve yourself.

The talk provided several interesting starting points for exploring further. However with an added exercise sixty minutes covered only the very basic ideas. Especially for the skill sets needed to successfully coach teams it would be very interesting to learn more on what books to read or what courses to attend to learn more on each topic. Ultimately not only agile coaches benefit from being great teachers, mentors or facilitators.

Video is up: Paolo Negri on scaling by one order of magnitude

2011-02-23 21:23

Video is up - Simon Willnauer on Lucene 4 Performance improvements

2011-02-22 21:21

Video is up - Josh Devins on Apache Hadoop at Nokia

2011-02-21 21:19

Call for Presentations Berlin Buzzwords - one more week to go

2011-02-21 20:24
As a little reminder: the Call for presentations of Berlin Buzzwords will close next week on Tuesday, March 1st. Submissions on scalable search, data storage and analysis are all welcome. We are looking for presentations on the core technologies such as Apache Hadoop, CouchDB, Lucene, Redis, Voldemort but also talks on interesting use cases and system architectures.

Tickets are out for sale - don't wait too long to get your early bird ticket.

In addition we are in the process of drafting some additional packages for those of you who would like to bring their non-tech spouse to the conference or need day care facilities for their children. If you are interested in either package please provide feedback on our blog.

Teddy in Amsterdam

2011-02-20 20:16

On his trip from SFO back to Europe teddy spent a few days in Amsterdam. He brought back the following pictures of bikes, cheese and the canals:



Going to Amsterdam usually is pretty darn dangerous: I tend to return with some toys added to my collection of puzzles. Somehow I tend to be drawn to the shop called Gamekeeper selling them even when I do not remember the exact address or even just the street name:



Apache Mahout Meetup Amsterdam

2011-02-19 20:18

Last week I was honoured to be invited as one of the two speakers on Apache Mahout at the Mahout meetup in Amsterdam at JTeams offices. After free beer, cola and pizza Frank Scholten gave an overview of Mahout's clustering capabilities. After a brief introduction to Mahout itself he went into a little more detail on how clustering works in general. After that with a selection of Seinfeld scripts he used a fun data set to guide the audience through the process of choosing the right data preparation steps, coming up with good training parameters and finally evaluating clustering quality.

After that I gave a brief introduction to classification with Mahout - going into a little more detail when it comes to data preparation and quality evaluation. The audience seemed most interested in learning more on how data preparation works - after all that step cannot really be covered by Mahout itself (though we do have some support) but instead needs a lot of domain knowledge from the user side.

Judging from the brief round of self introductions the meetup was well visited by an intesting mixture of people coming from JTeam, Hippo, the dutch police working on data analytics, developers working at RIPE and many more.

If you are interested in more data analysis, search and data storage - do not miss registration for Berlin Buzzwords on June 6/7th 2011.