CFP - Berlin Buzzwords 2011 - search, score, scale

2011-01-26 08:00
This is to announce the Berlin Buzzwords 2011. The second edition of the successful conference on scalable and open search, data processing and data storage in Germany,
taking place in Berlin.

Call for Presentations Berlin Buzzwords

Berlin Buzzwords 2011 - Search, Store, Scale

6/7 June 2011

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • IR / Search - Lucene, Solr, katta or comparable solutions
  • NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Closely related topics not explicitly listed above are welcome. We are looking for presentations on the implementation of the systems themselves, real world applications and case studies.

Important Dates (all dates in GMT +2)

  • Submission deadline: March 1st 2011, 23:59 MEZ
  • Notification of accepted speakers: March 22th, 2011, MEZ.
  • Publication of final schedule: April 5th, 2011.
  • Conference: June 6/7. 2011

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Proposals should be submitted at no later than March 1st, 2011. Acceptance notifications will be sent out soon after the submission deadline. Please include your name, bio and email, the title of the talk, a brief abstract in English language. Please indicate whether you want to give a lightning (10min), short (20min) or long (40min) presentation and indicate the level of experience with the topic your audience should have (e.g. whether your talk will be suitable for newbies or is targeted for experienced users.) If you'd like to pitch your brand new product in your talk, please let us know as well - there will be extra space for presenting new ideas, awesome products and great new projects.

The presentation format is short. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us.

Follow @berlinbuzzwords on Twitter for updates. News on the conference will be published on our website at

Program Chairs: Isabel Drost, Jan Lehnardt, and Simon Willnauer.

Schedule and further updates on the event will be published on Please re-distribute this CfP to people who might be interested.

Contact us at:

newthinking communications GmbH
Schönhauser Allee 6/7
10119 Berlin, Germany
Julia Gemählich
Isabel Drost
+49(0)30-9210 596

Apache Mahout in Amsterdam

2011-01-25 20:00
On February 7th there will be an Apache Mahout meetup in Amsterdam kindly organised by JTeam. There will be two presentations - one by myself on classification with Apache Mahout as well as a second one by Frank Scholten on clustering with Apache Mahout.

  • Time: 18:00
  • Location: Frederiksplein 1, 1017XK Amsterdam, The Netherlands

Looking forward to a few days in Amsterdam.


2011-01-23 15:46
It's already sort of a nice little tradition for me to spend the first weekend in February in Brussels for FOSDEM. This year I am particulary happy that there will be a Data Analytics Dev Room at FOSDEM. A huge Thanks to @ogrisel and @nmaillot who have done most of the heavy lifting of getting the schedule in place.

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

Looking forward to an interesting Cloud Track, to meeting Peter Hintjens who is going to give a talk on 0MQ, the DevOps presentation and lots of very interesting DevRooms. Looks like again it's going to be tough to decide on which presentations to go to at any one time.

O'Reilly Strata Conference

2011-01-22 04:34
Title: O'Reilly Strata Conference
Location: Santa Clara
Link out: Click here
Description: Early next February O'Reilly is planning to put on a very interesting conference on the topic of data analysis and the business of generating value from raw digital data.

Strata 2011

I'm really glad to have received the acceptance notification for my presentation and travel sponsorship from the DICODE project. So see you in Santa Clara.
Start Date: 2011-02-01
End Date: 2011-02-03

If you are still unsure whether you should attend or not: Strata kindly handed out discount codes to speakers to share with their followers and readers. It saves you 25% of the registration cost - just use str11fsd during registration.

WiFi at the Apache Hadoop Get Together

2011-01-18 20:40
Just a brief reminder: The next Apache Hadoop Get Together is scheduled to take place on Thursday, January 27th at 6p.m. at the Zanox Event Campus at Media Spree Berlin.

We have three very interesting talks, though thirty guests registered already, we still have a few free seats. Head over to the xing event page to register if you have not done so yet.

If you would like to have access to the local WiFi please let me know - I need to register your mail address for that two days before the event with the venue.

A huge thanks to Zanox for providing the location for free, another huge thanks to Cloudera for sponsoring video taping of the event.

Apache Hadoop Get Together Berlin - January 2011

2010-12-28 16:31
This is to announce the next Apache Hadoop Get Together sponsored by Cloudera and Zanox that will take place in the Zanox Event Campus in Berlin.

When: January 27th 2011, 6p.m.

Where: zanox Event Campus (Please mark the changed event location.)

Größere Kartenansicht

As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. We head over to a bar after the event for some beer and something to eat.

Talks scheduled so far:

Simon Willnauer: "Lucene 4 - Revisiting problems for speed"

Abstract: This talk presents a brief case study of long standing problems in Lucene and how they have been approached to gain sizable performance improvements. Each of the presented problems will have brief introduction, implemented solution and resulting performance improvements. This talk might be interesting even for non-lucene folks.

Josh Devins: "Title: Hadoop at Nokia"
Abstract: In this talk, Josh will outline some of the ways in which Nokia is using Hadoop. We will start by having a quick look at the practical side of getting started with Hadoop and outline cluster hardware and configuration and management with tools like Puppet. Next we'll dive head first into how Hadoop and its' ecosystem are being utilized on a daily basis to perform business analytics, drive machine learning and help build data-driven products. We will also touch on how we go about collecting metrics from dozens of applications distributed in multiple data centers around the world. An open Q&A session will follow.

Paolo Negri: "The order of magnitude challenge: from 100K daily users to 1M "
Abstract: "Social games backends share many aspects of normal web applications, but exasperate scaling problems, follow this talk to see how we evolved and brought a plain ruby on rails app to sustain 5000 reqs/sec, moved part of our data from sql to nosql to reach 5 millions queries per minute and see what we learned from this experience."

Please do indicate on Upcoming or Xing if you are coming so we can more safely plan capacities.

A big Thank You goes to zanox for providing the venue for free for our event as well as to Cloudera for supporting videos being taped of the presentations.

Looking forward to seeing you in Berlin,

White Christmas

2010-12-25 16:44
Christmas brought "a little surprise" to Germany last night:

The result this morning: a few cm (as in about 20) of snow - that is a white christmas everyone had been looking forward to. Just one question: Where to put all that snow when digging out my car? ;)

Thanks to snow clearing services streets are all white but well drivable. Other than that lots of time for enjoying the white weather. I'll probably use the time off to to go out with a sledge tomorrow.

Apache Hadoop - Trainings by Cloudera in Berlin

2010-12-22 23:53
Cloudera is offering trainings both for Administrators as well as for Developers early next year in Berlin. If your are getting started in using Apache Hadoop this might be a great option to get your developers and operations up to speed with the framework. If you are a regular of the local Apache Hadoop Get Together a discount code should have been sent to you by mail.

Devoxx – Day one – Java, Performance and Devops

2010-12-15 21:22
In his keynote Mark Reinhold provided some information on the very interesting features to be included in the Java 7 release. Generics will be easier to declare with the diamond operator. Nested try-finally constructs that are nowadays needed to safely close resources will no longer be necessary – their will be the option of implementing a Closeable interface supporting a method close() that get's called whenever objects of that class's type go out of scope. That way resources can be freed automatically. Though different in concept, it still reminds me a lot of the functionality typically provided by destructors in C++.

The support for lambda operators and direct method references that will greately help reducing clutter due to nested inner classes has been postponed for later Java releases. Though it took 4 years to come up with the Java 7 release new features are pretty much limited. However the current roadmap looks pretty much release date driven. The intention seems to be to get developers focussed on a limited set of reachable features to finally get the release out into the hands of users.

The speaker claimed Oracle to remain committed to Java development – first and foremost because of being a heavy Java user themselves. However also in order to generate revenue indirectly (through selling support and consulting for Java related products), directly (through Java support) and reducing internal development cost and Java friction.

Though Oracle had a JVM implementation of its own (jRocket) development of HotSpot will be continued – mostly due to a larger number developers being familiar with HotSpot. However monitoring and diagnosis tooling that was superior at jRocket is supposed to be ported to HotSpot.

In the core Java session I also went to the talk on Java performance analysis by Joshua Bloch. He a good job bringing the topic of performance analysis on complex systems to software developers. In ancient times it was quite easy to estimate a piece of code's static performance by static code analysis. Looking at the expression if (condition && secondCondition) it is still commonly considered to be faster to use “&&” over “&”. However looking at current CPU architectures that make heavy use of instruction pipelines it heavily depends on their branch prediction heuristics whether this statement is still true. Dirtying the pipeline by using && may well be more expensive than doing the extra evaluation. General message: The performance of your code in a real world system depends on the hardware it runs on, the operating system as well as the exact VM version used. Estimating performance based on static analysis only is no longer possible.

However even when doing benchmarks one might well reach false conclusions. It is common knowledge that running a benchmark on a VM is required to be run multiple times – VM warmup phases are well known to developers, so the common performance pattern for on specific function usually looks like that:

However even when repeating the test on the same machine multiple times, the values seen after warm-up may be skewed substantially. The only remedy to reaching false conclusions is to do several VM runs, average of the runs (and provide median etc. that are less susceptible to outliers) and provide error bars for each averaged run. When comparing two different implementations the only way to reliably tell which one is better than the other is to do statistical significance tests. Consider the diagram below. When leaving error bars out, the left implementation seems clearly better than the right. However when taking into account how widely skewed the performance numbers are and adding error bars to the entries, this is no longer the case: Both runs are no longer statistically significantly different.

Apache Mahout Hackathon Berlin

2010-12-14 20:50
Early next year - on February 19th/20th to be more precise - the first Apache Mahout Hackathon is scheduled to take place at c-base. The Hackathon will take one weekend. There will be plenty of time to hack on your favourite Mahout issue, to get in touch with two of the Mahout committers and get your machine learning project off the ground.

Please contact if you are planning to attend this event or register with the xing event so we can plan for enough space for everyone. If you have not registered for the event there is now guarantee you will be admitted.

If you'd like to support the event: We are still looking for sponsors for drinks and pizza.