December Apache Hadoop Get Together @ Berlin

2009-11-15 18:01
As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009.

When: Wednesday December 16, 2009  at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Richard Hutton ( "Moving from five days to one hour." - This talk explains how we made data processing scalable at The company's core business is online advertisement targeting. Our servers receive 10,000 requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a framework built on top of Hadoop to make analytics speedy,data mining detailed and all of our lives easier. We will give an overview of our solution involving file system structures, scheduling, messaging and programming languages from the future.

Jörg Möllenkamp (Sun): "Hadoop on Sun"
Abstract: Hadoop is a well known technology inside of Sun. This talk want to show some interesting use cases of Hadoop in conjunction with Sun technologies. The first show case wants to demonstrate how Hadoop can used to load massive multicore system with up to 256 threads in a single system to the max. The second use case shows how several mechanisms integrated in Solaris can ease the deployment and operation of Hadoop even in non-dedicated environments. The last usecase will show the combination of the Sun Grid Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns."

We would like to invite you, the visitor to also tell your Hadoop story, if you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to StudiVZ for sponsoring videos of the talks. Links to the videos will be posted here as well as on the StudiVZ blog.

Please do indicate on the following upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier:

Looking forward to seeing you in Berlin,

Apache Con US - Program up

2009-10-29 07:35
The final program is available for download over at The schedule is packed with interesting talks on Hadoop, Lucene, Tomcat, httpd, web services, osgi. For those less tech-savvy there is a business track explaining how to best use open source software in an entreprise environment. There is also a community track explaining what makes open source projects successful.

Looking forward to seeing you in Oakland.

Lucene 2.9 White Paper

2009-10-28 21:51
Lucid recently published a white paper that explains the changes and improvements that the new 2.9 release incorporates. Interesting for all who are thinking about upgrading to the new lucene version or generally want to know what is going on at Lucene.

Lucene 2.9 @ Heise

2009-10-06 18:13
After last week's Hadoop Get Together heise published an in-depth article on the changes and improvements that come with the latest Lucene 2.9 release.

Thanks to Simon Willnauer for helping me write this article and patiently explaining several new features. Thanks also to Uwe Schindler for kindly proof-reading the article before it was sent out to Heise.

Mahout@TU WS 09/10

2009-09-09 23:08
Title: Mahout@TU WS 09/10

There is going to be a project/seminar course at TU Berlin on Apache Mahout. The goal is to introduce students to the work on a free software project, interact with the community and build production ready software.

Students will be given several potential tasks ranging from optimizing existing implementations, implementing new algorithms and (depending on their prior knowledge) improving, scaling and parallelizing existing algorithms.

Successful completion of the course depends on a number of factors: Interaction of the student with the community, ability to write tested (as in test-first-developed) code that performs well in a large scale environments, ability to show incremental development progress at each iteration, ability to review patches and improvements, usage of tools like SCM, Issue-tracker and mailinglists. Of course theoretical background - that is understanding existing publications as well extending their ideas is crucial as well.

If you are a student interessted in Mahout missing some course work, consider subscribing to the Mahout course at DIMA Berlin (linked below). Goal is that your work is to be integrated in one of the next releases, once the community is satisfied.

If you are a Mahout developer or user and have some issue that you consider suitable for a student to solve, please to provide your ideas.

Location: TU Berlin
Link out: Click here
Start Date: 2009-10-01
End Date: 2010-03-31

GSoC at Mahout

2009-09-09 22:22
GSoC 2009 is about to finish: Final evaluations are through, most of the code submitted by Mahout's students has been committed to svn, code samples are on their way to Google.

In Mahout, we had three students joining the project: Robin working on an HBase based Naive Bayes extension and on frequent itemset discovery. David contributing a distributed LDA implementation. Deneche was working on a Random Forest implementation. All three of them have done great work during this summer, contributing not only code but valuable input on the project's mailinglists as well. As a result, all three of them have been given committer status by the end of GSoC.

Apart from three new additions to the code base, summer also brought quite some traffic to the user list - not only in terms of subscriptions but also in terms of developers contributing to the discussions online. Currently, it looks like the project is really gaining momentum, as also noted in Grant Ingersoll's post.

Discussions on the dev list on the future road map of Mahout clearly showed that the developers share the vision of a scalable, potentially distributed, stable machine learning library. That the focus should be on production ready code under a commercially friendly license instead of bleeding edge research implementations. Last but no least the goal is to build a lively, diverse community around the project to guarantee further development and user support.

2009 brought quite a few talks both in Germany as well as the US on the topic of Mahout (besides all the events on Hadoop, scalable databases and cloud computing in general) with an Apache Con US talk introducing Mahout in Oakland still to come.

Yesterday, a great article indroducing Apache Mahout with hands-on examples was published on IBM Developerworks by Grant Ingersoll. Check it out, if you want to learn more on Mahout, and Machine Learning in general.

First NoSQL Meetup in Germany

2009-09-09 18:58
On October 22nd 2009 the first NoSQL Meetup Germany is going to take place in newthinking store/ Berlin:

Please submit your presentation proposals until September 22nd, accepted speakers will be notified soon after.

If you would like to sponsor the event, feel free to contact us: We would be very happy to provide videos after the event and free drinks for everyone during the event.

Hope to see you soon in Berlin.

Apache Con drawing closer

2009-09-04 06:47
By November I will be traveling to Oakland - for me it is the first Apache Con US ever. And the first Apache Con I will be giving a talk in one of the main tracks:

I will be presenting Apache Mahout, give an overview of the project, of our current status and explain which problems can be solved with the current implementation. The talks will conclude with an outlook to upcoming tasks and features our users can expect in the near future.

There is great news already: First commercial users like Mippin are explaining their experiences with Mahout.

Currently, I am looking forward to meeting several Mahout (and Lucene, Hadoop, Solr, ...) committers there. I met some at Apache Con EU already, but it's always nice to talk to people in person who before one only knew from mailing lists. Of course I am also looking forward to having time to review and write code. Hope to see you there.

Update: Flights booked.

Inglourious Basterds

2009-08-24 22:48
This evening I went to the cinema Odeon in Berlin Schöneberg. It is a pretty traditional, old-fashioned and very lovely cinema that has specialised on showing non-dubbed, original versions of movies.

Showing the great movie Inglourious Basterds, the cinema was completely sold out today. Fortunately we were able to grab some of the last tickets.

Just in case the entrance seemed familiar to those who have attended a Mahout presentation in the recent past - a picture of the Odeon usually visualises one part of my motivation on the Mahout slides ;)

Apache Hadoop Event Blog

2009-08-24 20:38
As Apache Hadoop becomes ever more popular both in industry as well as in research, user groups, conferences and hacking days are being scheduled around the world. The goal of the event calendar blog hosted on is to provide a common space for organizers to announce their events and potential participants to look for new conferences.