Dragan Milosevic on Product Search and Reporting with Hadoop

2010-03-19 20:30
At the last Hadoop Get Together in Berlin Dragan Milosevic from zanox in Berlin gave a presentation on "Product Search and Reporting powered by Hadoop". The talk has been video recorded. The result is now available online:

Hadoop Dragan Milosevic from Isabel Drost on Vimeo.

Feel free to share and distribute the video. Thanks to Dragan for a fantastic talk on Zanox' usage of Hadoop - and on providing some background information on why and how you introduced Hadoop into your systems.

Another thanks to Nokia for sponsoring the video taping - and to newthinking for providing the location for free.

One more video to go. It will be available early next week.

Chris Male on spatial search with Lucene

2010-03-16 20:42
Last week the March 2010 Hadoop Get Together took place in Berlin. Last speaker was Chris Male on spatial search with Lucene and Solr. The video is now available online:

Lucene Chris Male from Isabel Drost on Vimeo.

Feel free to share and distribute the video to anyone who might be interested. Thank you Chris, for traveling over from Amsterdam for an awesome talk on spatial search.

If you want to learn more on what people over at Lucene and Solr are currently working one, head over to Berlin Buzzwords - a conference on scalable search, storage and data analysis. If you yourself have interesting projects - feel free to submit a talk.

Thanks to Nokia for sponsoring the video taping - and again as always thanks to newthinking for providing the location for free.

Slides are available

2010-03-11 00:49
Slides for the last Hadoop Get Together are available online:

Videos will follow as soon as the are ready. Watch this space for further updates.

Apache Hadoop Get Together March 2010

2010-03-11 00:40
Today (or more correctly, yesterday) the March 2010 Hadoop Get Together took place in newthinking store. I arrived rather early to have some time to do some planning for Berlin Buzzwords - got there nearly one hour before the meetup. However it did not take very long until first guests came to the store. So I quickly got my introductory slides in place - Martin from newthinking already had the room setup, camera in place and audio working.

When starting the meetup the room was already packed with some 60 people - we ended up having over 70 people interested in the mix of talks on Hadoop, HBase and Spatial search with Lucene and Solr. Doing the regular "Who are you"-round, we learned that there were people from nurago, Xing, StudiVZ, *lots and lots* of people from Nokia, Zanox, eCircle, nugg.ad and many others.

The meetup was kindly supported by newthinking store (venue for free) and Nokia (sponsored the videos). Steffen Bickel took his chance during the introduction to give a brief overview of Nokia and - guess - explain, that Nokia is a great place to work and yeah - they are hiring!

The first talk was given by Bob Schulze who joined the meetup coming from eCircle in Munich. Given his previous experience with scaling their infrastructure from a regular database/ datawarehouse setup he explained how HBase helped when processing really large amounts of data. Being an e-mail marketing provider, eCircle does have quite a bit of data to process. And yes, eCircle is hiring.

Second talk was by Dragan Milosevic from Zanox on scaling product search and reporting with Hadoop. Just as eCircle, Zanox came from a regular RDMS setup that became too expensive and too complex too scale before switching over to a Hadoop/Lucene stack. He used his chance to make the Lucene developers aware of the fact that there are users who would were actually using Lucene's compression features. Zanox, as well, is looking for people to hire.

Last talk was by Chris Male from JTeam in Amsterdam on the developments in Lucene and Solr to support for spatial search. There are various development routes being followed: Cartesian tiers as well as numeric range searches. He also explained that most of the features are still under heavy development. He finished his talk with a demo on what can be done with spatial search in Lucene/ Solr. You already guessed so, JTeam is hiring as well ;)

After the talks we went to Cafe Aufsturz for beers, drinks and some food. People enjoyed talking to each other exchanging experiences. A Lucene focussed table quickly formed - main topics: Spatial search, Lucene/Solr merge threads, heavy committing, Mike McCandless (is this guy real or just an alter-ego of the Lucene community?).

At some time around 11p.m. the core of the guests (well - the Lucene part of the meetup, that is Simon, Uwe and the guys from JTeam) moved over to a bar close by next to cinema central for some more beer and drinks. At about 1a.m. it finally was time to head home.

I'd like to say thanks: First of all to the speakers. Without you the meetup would not be possible. Second to newthinking and Nokia for their support. And of course to all attendees for having grown the meetup to its current size.

I had a really nice evening with people from the Hadoop, HBase and Lucene community. Special thanks to you guys from JTeam for traveling 6h to Berlin just for a "little", though no longer that tiny, Hadoop meetup. Promise stands, to visit one of your next Lucene meetups in Amsterdam and present Mahout there - however I need some help finding affordable accomodation ;)

Hope to see you all in June at Berlin Buzzwords.

Apache Hadoop Get Together - March 2010 - Update

2010-02-11 14:25
Due to conflicts in the schedule of newthinking store, we had to change the time of the Get Together slightly. We will start one hour earlier than announced.

When: March 10th, 4p.m.
Where: newthinking store, Tucholskystr. 48, Berlin Mitte

Looking forward to seeing you there.

Hadoop trainings in Europe

2010-02-02 19:23
Recently I received this mail from Christophe Bisciglia on Cloudera Hadoop trainings. Thought it might be interesting to the Hadoop Berlin community:

Hadoop Fans,

Over the next year, you'll see new options for Hadoop training and
certification from Cloudera. One of the first things you'll see will
be live sessions outside the US, tentatively planned for the April /
May time frame.

We've seen strong interest in Hadoop on all of our international
trips, so we'd like to ask for community input as we decide exactly
which cities to visit next. For cities we come to, we'll offer our 3
day developer training + certification, and with sufficient interest,
we may also include a 1 day training + certification program for
system administrators.

If you are interested in attending one or both of these sessions,
please fill out a brief survey (link below). If you're using Hadoop at
work, and it's time to train more of your team, you can let us know
how large of a group you have. Survey responses aren't a commitment to
attend, but we may reach out to respondents before we schedule a
session to get a better understanding of actual attendance.

You can fill out survey here: http://www.surveymonkey.com/s/MKGZHG9

If you have any trouble with the survey, or are interested in a
private training session, please don't hesitate to reach out directly.


Hadoop at Heise c't

2010-01-31 13:37
Interesting for those readers speaking German: Heise published an introductory article on Hadoop in its latest issue. Have fun reading.

Thanks to Simon for proof-reading and providing valuable input. Thanks to Thilo Fromm for the hadoop graphics (unfortunately none of them got published in its original form), the catchy title, proof-reading the text over and over again and for keeping me sane during several past and coming months.

If you want to know more on Apache Hadoop, come watch my FOSDEM Hadoop talk next weekend. If you want to join discussions on Apache Hadoop and Lucene, stay tuned for a conference in Berlin on these topics.

March 2010 Apache Hadoop Get Together Berlin

2010-01-29 08:40
This is to announce the next Apache Hadoop Get Together that will take place in newthinking store in Berlin.

  • When: March 10th, 4p.m.
  • Where: Newthinking store Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

View Larger Map

Talks scheduled so far:

Chris Male (JTeam/ Amsterdam): Spatial Search with Solr

Abstract: The rise in popularity of Google Maps and mobile devices with GPS have resulted in a trend in the search field. People are no longer content with finding results that match a text query, they also want to find results which are near a location. So called spatial search differs considerably from traditional free text search in that it cannot be achieved through common search techniques such as inverted indexes. Instead, new algorithms and data structures had to be developed that achieve efficient and accurate spatial search, that also allow spatial search to have a role in the determination of a result's relevance. This technology has primarily been found in proprietary closed source search applications, however in the last 12-18 months, considerable effort has been invested into bringing open source spatial search support to Apache Solr and Lucene. While much is still left to be done, this talk will introduce how spatial search is currently supported in Solr, what work is happening currently, and a roadmap for future developments.

Dragan Milosevic (zanox/ Berlin: Product Search and Reporting powered by Hadoop


To efficiently process and index 80 million products, as well as store and analyse 30 million clicks and 500 million views daily, Zanox AG is using Hadoop HDFS and Map?Reduce technologies. This talk will present product-processing and reporting frameworks running on 17 node Hadoop cluster, being able to (1) robustly store products and tracking data in distributed manner, (2) rapidly consolidate, normalise and categorise products, (3) merge and aggregate tracking data and (4) efficiently builds indexes for supporting distributed search and reporting, running in several search clusters.

Bob Schulze (eCircle/ Munich): Database and Table Design Tips with HBase

Abstract: Recurring design patterns for the BigTable/HBase storage model.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to Nokia Gate 5 for sponsoring videos of the talks. Links to the videos will be posted here.

Please do indicate on the following Upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier. Registration through Xing is possible as well.

Looking forward to seeing you in Berlin,

Third "December Hadoop Get Together" video online

2010-01-05 19:29
In the following video taken at the last Hadoop Get Together in Berlin Jörg Möllenkamp explains why Hadoop is interesting for Sun - and why Sun Hardware might be a good fit for Hadoop applications:

Hadoop Jörg Möllenkamp from Isabel Drost on Vimeo.

In a blog post published after the event, Jörg gives more details on his idea of Parasitic Hadoop he introduced at the meetup.

Second December Hadoop Get Together video

2010-01-03 14:57
Richard Hutton from nugg.ad explained how they scaled their ad recommendation system to an increasing number of users with the help of Hadoop. To learn more on their use case and details on which problems they solved with Hadoop, watch the video below:

Hadoop Richard Hutton from Isabel Drost on Vimeo.