First December Apache Hadoop Berlin video online

2009-12-31 20:27
The video of Nikolaus' Pohle's talk at the December Apache Hadoop Get Together Berlin is online already - more to come soon.

hadoop nikolaus pohle from Isabel Drost on Vimeo.

Thanks to Martin from newthinking for video taping and uploading. Thanks to StudiVZ for sponsoring the video.

Summary - December Get Together

2009-12-16 22:23
Today the seventh Apache Hadoop Get Together took place in Berlin. The room was again packed with more than 40 people from various companies with and without practical experience with Hadoop: There were people from Nokia Gate 5, Sun, nurago, StudiVZ, Dawanda,, There were people from academia, e.g. HPI Potsdam. And a few Freelancers interested in the topic or providing help with Hadoop.

We had three very interesting talks. The first one was given by Richard Hutton from on their usage of Hadoop. They provide targeted advertisement services to their clients. Naturally they do need to process lots of user interactions to be able to draw reliable conclusions. started out with a traditional system setup: Erlang loggers in front, data got fed to well known data warehouse infrastructures, analysed and results pushed back to the frontends. However this architecture would scale only so far. So in the beginning of 2009 they started migrating their systems over to Hadoop. (A Thanks from the speaker to Tom White for publishing the Hadoop book at O'Reilly that obviously helped the developers a lot.). Today, is down from one to two days for analysis to one to two hours. I will link the slides of the talk as soon as I have the pdf version available.

Second talk was given by Jörg Möllenkamp on what Sun is doing with Hadoop. Sun does have "special hardware" - special in that the have systems with up to 512 virtual processors on one chip. With Solaris they do have an operating system that scales to that architecture. But now they are looking for applications that can use such hardware efficiently as well. Hadoop is well suited for distributing computations - so it looked like a great fit for Sun. Slides are available online.

The last talk was given by Nikolaus Pohle from nurago. They switched to Hadoop only recently. Coming from online market analysis, they have to analyse lots of user interaction data. Currently they are moving away from a MySQL based architecture to a distributed system based on HDFS and Map/Reduce. In order to ease writing M/R jobs for their employees they built their own abstract language on top of Hadoop that helps formulating recurring jobs. That does sound a lot like what PIG or Cascading already does - but is specially targeted at the type of jobs they have. Slides are available online. There is also a pdf version for users who prefer open formats.

If anyone should be interested in it, I also put my introductory slides online.

Next meetup will be in March 2010. It will feature a talk by Zanox on their Hadoop usage, one talk by eCircle from Munich as well as one talk by Nokia. You are very welcome to join us. If you would like to give a presentation yourself - please do contact me. If you would like to sponsor the event, please send me an e-mail.

A big Thank You to all the speakers - Nikolaus Pohle from nurago, Jörg Möllenkamp from Sun and Richard Hutton from - without you, the event would not be possible. Another big Thank You to newthinking for providing the venue for free. And, last but not least, another big Thank You to StudiVZ for sponsoring the videos. They will be linked to from here as well as from the StudiVZ blog as soon as they are available.

On Wednesday: December Apache Hadoop @ Berlin

2009-12-14 20:15
This week on Wednesday at 5p.m. the December Hadoop Get Together takes place in newthinking store Berlin.

Talks scheduled so far:

  • Richard Hutton ( “Moving from five days to one hour.”
  • Jörg Möllenkamp (Sun): “Hadoop on Sun”
  • Nikolaus Pohle (nurago): “M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns.”

There will be videos after the event linked to by StudiVZ (thanks for sponsoring) after the Meetup is over.

As this is the last Meetup before Christmas there will be cookies waiting for you.

If you want to get notifications of future events on Apache Hadoop, NoSQL, Apache Lucene - be it trainings, meetups or conferences - feel free to subscribe to the Mailinglist or join the Xing Group that accompanies the Berlin Get Together.

Reminder: Apache Hadoop Get Together next week

2009-12-07 20:16
Just a tiny little reminder: The Apache Hadoop Get Together Berlin is scheduled to take place next week on Wednesday.

When: 16th of December 8PM
Where: newthinking store Tucholskystr. 48, Berlin Mitte
Kindly sponsored by: newthinking store (location) and StudiVZ (videos).

Please register (or use Xing for registration) so planning becomes a bit easier.

Talks scheduled:

  • Richard Hutton ( "Moving from five days to one hour."
  • Jörg Möllenkamp (Sun): "Hadoop on Sun"
  • Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns."

Looking forward to seeing you in Berlin next week.

December Apache Hadoop Get Together @ Berlin

2009-11-15 18:01
As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009.

When: Wednesday December 16, 2009  at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Richard Hutton ( "Moving from five days to one hour." - This talk explains how we made data processing scalable at The company's core business is online advertisement targeting. Our servers receive 10,000 requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a framework built on top of Hadoop to make analytics speedy,data mining detailed and all of our lives easier. We will give an overview of our solution involving file system structures, scheduling, messaging and programming languages from the future.

Jörg Möllenkamp (Sun): "Hadoop on Sun"
Abstract: Hadoop is a well known technology inside of Sun. This talk want to show some interesting use cases of Hadoop in conjunction with Sun technologies. The first show case wants to demonstrate how Hadoop can used to load massive multicore system with up to 256 threads in a single system to the max. The second use case shows how several mechanisms integrated in Solaris can ease the deployment and operation of Hadoop even in non-dedicated environments. The last usecase will show the combination of the Sun Grid Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns."

We would like to invite you, the visitor to also tell your Hadoop story, if you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to StudiVZ for sponsoring videos of the talks. Links to the videos will be posted here as well as on the StudiVZ blog.

Please do indicate on the following upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier:

Looking forward to seeing you in Berlin,

Hadoop Get Together Berlin @ Apache Con US Barcamp

2009-11-03 21:05
This is my first real day at ApacheCon US 2009. I arrived yesterday afternoon, was kept awake by three Lucene committers until midnight: "Otherwise you will have a very bad jetlag"... Admittedly it did work out: I slept like a baby until about 08:00a.m. the next morning and am not that tired today.

Today Hackthon, Trainings and barcamp Apache happen in parallel. Ross Gardler tricked me into doing a presentation on my experiences on doing local user meetups. I put the slides online.

The general consent was, that it is actually not that hard to do such a meetup - at least if you are have someone locally to help organizing or do it in a town you know very well. There are ways to get support from the ASF for doing such meetups - people help you get speakers, talk to potential sponsors or find a location. In my experience if doing the event in your hometown, finding a location is not that hard: Either you are lucky having someone like newthinking store around. Or you can contact you local university or even your employer to find some conference room that you can use for free.

Getting the first two to three meetups up and running - especially finding speakers - is hard. However you should be able to benefit from being part of an Apache project already and probably know your community and know who would be willing to speak at one of those meetups. Once the meetup is well established, it should be possible to find sponsors to pay for video taping, free beer and pizza.

Keep in mind that having a fixed schedule ready in advance helps to attract people - it's always good to know why one should travel to the meetup by train or plane. Don't forget to plan for time for socializing after the event - having some beer and maybe food together makes it easy for people to connect after the meetup.

Apache Hadoop Get Together Berlin

2009-10-29 07:40
Title: Apache Hadoop Get Together Berlin
Location: newthinking store, Tucholskystr. 48, Berlin Mitte
Link out: Click here
Description: The upcoming Apache Hadoop Get Together Berlin will feature four talks by people explaining how they put Hadoop to good use in their entreprise. Table at Cafe Aufsturz is booked already. Talks will be announced late next week.
Start Time: 17:00
Date: 2009-12-16

Videos are up

2009-10-22 07:31
As of yesterday the videos of the last Apache Hadoop Get Together Berlin are available online.

Thanks to the speakers for providing insight in their projects and thanks to Cloudera for sponsoring the videos.

The next meetup will be announced soon - three talks have already been proposed. In addition, StudiVZ offered to sponsor video taping of the next Get Together. Looking forward to seeing you in Berlin in December.

Lucene 2.9 @ Heise

2009-10-06 18:13
After last week's Hadoop Get Together heise published an in-depth article on the changes and improvements that come with the latest Lucene 2.9 release.

Thanks to Simon Willnauer for helping me write this article and patiently explaining several new features. Thanks also to Uwe Schindler for kindly proof-reading the article before it was sent out to Heise.

Slides are up

2009-09-30 09:02
The slides for yesterday's talks just arrived. They are available online at:

Videos will be online early next week.