Last week I was honoured to be invited as one of the two speakers on Apache Mahout at the Mahout meetup in Amsterdam at JTeams offices. After free beer, cola and pizza Frank Scholten gave an overview of Mahout's clustering capabilities. After a brief introduction to Mahout itself he went into a little more detail on how clustering works in general. After that with a selection of Seinfeld scripts he used a fun data set to guide the audience through the process of choosing the right data preparation steps, coming up with good training parameters and finally evaluating clustering quality.
After that I gave a brief introduction to classification with Mahout - going into a little more detail when it comes to data preparation and quality evaluation. The audience seemed most interested in learning more on how data preparation works - after all that step cannot really be covered by Mahout itself (though we do have some support) but instead needs a lot of domain knowledge from the user side.
Judging from the brief round of self introductions the meetup was well visited by an intesting mixture of people coming from JTeam, Hippo, the dutch police working on data analytics, developers working at RIPE and many more.
If you are interested in more data analysis, search and data storage - do not miss registration for Berlin Buzzwords on June 6/7th 2011.