Second December Hadoop Get Together video

2010-01-03 14:57
Richard Hutton from nugg.ad explained how they scaled their ad recommendation system to an increasing number of users with the help of Hadoop. To learn more on their use case and details on which problems they solved with Hadoop, watch the video below:

Hadoop Richard Hutton from Isabel Drost on Vimeo.

With a little help from my friends

2009-12-31 23:55
The end of the year 2009 is quickly approaching. To me it feels a little like it ran away far too quickly. So instead of taking part in the annual review of past events, I would like to use it as an opportunity to say thank you: The past twelve months were a lot of fun with lots of interesting, nice people from all over the world. I got the chance to meet quite a bit of the Mahout community, I got lots and lots of new developers from all over Germany - or more precisely the EU - to attend the Apache Hadoop Get Together in Berlin. The interest in Mahout has grown tremendously over the past year.

All of this would not have been possible without the help of many people: First of all I'd like to thank Thilo Fromm - for making me happy whenever I was disappointed, for solacing me when I when I was sad, for patiently listening to me nervously whining before each and every talk, for kindly reviewing my slides and last but not least for helping me fix some of the problems that bugged me. Oh - and, thanks for helping me fix the issue in the zookeeper c-client within minutes that puzzled me for days.

Another big Thanks goes to family, first and foremost my mum, who kindly took care of organizing quite a bit of my paperwork and kept me on schedule with so many "unimportant" tasks like getting an appointment with some hospital to finally get the screws taken out of my knee ;)

A special thanks goes to the growing Mahout community as well as to the Lucene people - you know, who you are - keep up the great work: You rock!

Furthermore there are students at TU Berlin who have shown that with Mahout it is "dead-simple" to write an application that, given a stream of documents, groups them by topic and makes the result searchable in Solr. Thanks to you for solving the minor and major problems, for communicating with the community, for transparently communicating problems. Looking forward to continue working together with you next year.

Finally a big thank you to all of the speakers, sponsors and attendees of the Apache Hadopp Get Together, the NoSQL conference and the Apache Dinner Berlin - without you these events would never have been possible. Looking forward to seeing you again in January/ March 2010!

I hope I didn't forget too many people - just in case: I am pretty grateful for all the input, help and feedback I got this year.

PS: Another thanks to the spaceboyz visiting Berlin for 26C3 for helping Thilo tidy up our apartment after Congress was over this year ;)

First December Apache Hadoop Berlin video online

2009-12-31 20:27
The video of Nikolaus' Pohle's talk at the December Apache Hadoop Get Together Berlin is online already - more to come soon.

hadoop nikolaus pohle from Isabel Drost on Vimeo.



Thanks to Martin from newthinking for video taping and uploading. Thanks to StudiVZ for sponsoring the video.

Summary - December Get Together

2009-12-16 22:23
Today the seventh Apache Hadoop Get Together took place in Berlin. The room was again packed with more than 40 people from various companies with and without practical experience with Hadoop: There were people from Nokia Gate 5, Sun, nurago, StudiVZ, Dawanda, Last.fm, nugg.ad. There were people from academia, e.g. HPI Potsdam. And a few Freelancers interested in the topic or providing help with Hadoop.

We had three very interesting talks. The first one was given by Richard Hutton from nugg.ad on their usage of Hadoop. They provide targeted advertisement services to their clients. Naturally they do need to process lots of user interactions to be able to draw reliable conclusions. nugg.ad started out with a traditional system setup: Erlang loggers in front, data got fed to well known data warehouse infrastructures, analysed and results pushed back to the frontends. However this architecture would scale only so far. So in the beginning of 2009 they started migrating their systems over to Hadoop. (A Thanks from the speaker to Tom White for publishing the Hadoop book at O'Reilly that obviously helped the developers a lot.). Today, nugg.ad is down from one to two days for analysis to one to two hours. I will link the slides of the talk as soon as I have the pdf version available.

Second talk was given by Jörg Möllenkamp on what Sun is doing with Hadoop. Sun does have "special hardware" - special in that the have systems with up to 512 virtual processors on one chip. With Solaris they do have an operating system that scales to that architecture. But now they are looking for applications that can use such hardware efficiently as well. Hadoop is well suited for distributing computations - so it looked like a great fit for Sun. Slides are available online.



The last talk was given by Nikolaus Pohle from nurago. They switched to Hadoop only recently. Coming from online market analysis, they have to analyse lots of user interaction data. Currently they are moving away from a MySQL based architecture to a distributed system based on HDFS and Map/Reduce. In order to ease writing M/R jobs for their employees they built their own abstract language on top of Hadoop that helps formulating recurring jobs. That does sound a lot like what PIG or Cascading already does - but is specially targeted at the type of jobs they have. Slides are available online. There is also a pdf version for users who prefer open formats.

If anyone should be interested in it, I also put my introductory slides online.

Next meetup will be in March 2010. It will feature a talk by Zanox on their Hadoop usage, one talk by eCircle from Munich as well as one talk by Nokia. You are very welcome to join us. If you would like to give a presentation yourself - please do contact me. If you would like to sponsor the event, please send me an e-mail.

A big Thank You to all the speakers - Nikolaus Pohle from nurago, Jörg Möllenkamp from Sun and Richard Hutton from nugg.ad - without you, the event would not be possible. Another big Thank You to newthinking for providing the venue for free. And, last but not least, another big Thank You to StudiVZ for sponsoring the videos. They will be linked to from here as well as from the StudiVZ blog as soon as they are available.

On Thursday: Open Hadoop User Group Munich

2009-12-16 06:06
If one evening of Apache Hadoop is not enough for you: The next Christmas Meetup in Germany takes place one day later in Munich.


  • When: Thursday December 17, 2009 at 5:30pm open end
  • Where: eCircle AG, Nymphenburger Straße 86, 80636 München ("Bruckmann" Building, "U1 Mailinger Str", map in German http://www.ecircle.com/de/kontakt/anfahrt.html and look for the signs)


Talks scheduled by Bob and Lars:

Bob Schulze from eCircle will be giving the first presentation on how eCircle is planning to use the Hadoop stack.

Dave Butlerdi will be giving an overview of his usage of Hadoop.

Lars George will give a state of affairs of the HBase project. What is it, what does it do and how he is using it (since early 2008).

There is a quick connect via train from Berlin to Munich. So if you are attending the Berlin Get Together, it is very easy to travel south to Munich one day later and visit the Munich event as well.

On Wednesday: December Apache Hadoop @ Berlin

2009-12-14 20:15
This week on Wednesday at 5p.m. the December Hadoop Get Together takes place in newthinking store Berlin.

Talks scheduled so far:


  • Richard Hutton (nugg.ad): “Moving from five days to one hour.”
  • Jörg Möllenkamp (Sun): “Hadoop on Sun”
  • Nikolaus Pohle (nurago): “M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns.”


There will be videos after the event linked to by StudiVZ (thanks for sponsoring) after the Meetup is over.

As this is the last Meetup before Christmas there will be cookies waiting for you.

If you want to get notifications of future events on Apache Hadoop, NoSQL, Apache Lucene - be it trainings, meetups or conferences - feel free to subscribe to the Mailinglist or join the Xing Group that accompanies the Berlin Get Together.

Apache Hadoop at FOSDEM 2010

2009-12-11 09:19
Though the official schedule is not yet online: I will be giving an introductory talk about Apache Hadoop at next year's FOSDEM (Free and Open Source Developer European Meeting) in Brussles. This will be the 10th birthday of the event - looking forward to a fun event, meeting other free and open source software developers from all over Europe.





If you are a Apache Hadoop developer and would like me to include some particular topic in the talk - please feel free to contact me. If you are an Apache Hadoop user and would like to learn more on the project, please come to the talk and ask questions. If you are an Apache Hadoop Newbie - feel free to join us.

In addition there will be a NoSQL Dev Room at FOSDEM as well. The call for presentations is up already. So if you are doing fun stuff with CouchDB, HBase and friends or are a developer of these projects - submit a talk and join us in early-February in Brussles.

Reminder: Apache Hadoop Get Together next week

2009-12-07 20:16
Just a tiny little reminder: The Apache Hadoop Get Together Berlin is scheduled to take place next week on Wednesday.

When: 16th of December 8PM
Where: newthinking store Tucholskystr. 48, Berlin Mitte
Kindly sponsored by: newthinking store (location) and StudiVZ (videos).

Please register (or use Xing for registration) so planning becomes a bit easier.

Talks scheduled:

  • Richard Hutton (nugg.ad): "Moving from five days to one hour."
  • Jörg Möllenkamp (Sun): "Hadoop on Sun"
  • Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns."


Looking forward to seeing you in Berlin next week.

Mahout 0.2 released

2009-11-18 10:52
Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/


Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include


  • Major performance enhancements in Collaborative Filtering, Classification and Clustering
  • New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
  • New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
  • New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
  • New: HBase storage support for Naive Bayes model building and classification
  • New: Generation of vectors from Text documents for use with Mahout Algorithms
  • Performance improvements in various Vector implementations
  • Tons of bug fixes and code cleanup


Getting started: New to Mahout?



For more information on Apache Mahout, see http://lucene.apache.org/mahout

A very BIG Thank You to all those who made this release happen!

December Apache Hadoop Get Together @ Berlin

2009-11-15 18:01
As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009.

When: Wednesday December 16, 2009  at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Richard Hutton (nugg.ad): "Moving from five days to one hour." - This talk explains how we made data processing scalable at nugg.ad. The company's core business is online advertisement targeting. Our servers receive 10,000 requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a framework built on top of Hadoop to make analytics speedy,data mining detailed and all of our lives easier. We will give an overview of our solution involving file system structures, scheduling, messaging and programming languages from the future.

Jörg Möllenkamp (Sun): "Hadoop on Sun"
Abstract: Hadoop is a well known technology inside of Sun. This talk want to show some interesting use cases of Hadoop in conjunction with Sun technologies. The first show case wants to demonstrate how Hadoop can used to load massive multicore system with up to 256 threads in a single system to the max. The second use case shows how several mechanisms integrated in Solaris can ease the deployment and operation of Hadoop even in non-dedicated environments. The last usecase will show the combination of the Sun Grid Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): "M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns."

We would like to invite you, the visitor to also tell your Hadoop story, if you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to StudiVZ for sponsoring videos of the talks. Links to the videos will be posted here as well as on the StudiVZ blog.

Please do indicate on the following upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier:

http://upcoming.yahoo.com/event/4842528/


Looking forward to seeing you in Berlin,
Isabel