Berlin Buzzwords 2010 - Scalability conference June 7th/ 8th in Berlin

2010-05-14 15:48
The Berlin Buzzwords schedule was published a few days ago. There are tracks specific to the three tags search, store and scale. We have a fantastic mixture of developers and users of open source software projects that make scaling data processing today possible.

There is Steve Loughran, Aaron Kimball and Stefan Groschupf from the Apache Hadoop community. We have Grant Ingersoll, Robert Muir and the "Generics Policeman" Uwe Schindler from the Lucene community.

For those interested in NoSQL databases there is Mathias Stearn from MongoDB, Jan Lehnardt from CouchDB and Eric Evans, the guy who coined the term NoSQL one year ago.

The schedule has been published online. Visit the webpage and register for the conference - looking forward to seeing you in Berlin this summer!

Regular tickets are available online. In addition we offer student tickets: If you have a valid student ID, you are eligible for one of these tickets. Each costs 100,- Euro. We also have a special group ticket available: If you buy four tickets or more you are eligible for a discount of 25%, when purchasing 10 tickets or more the discount is 50%. Learn more at

One day before the conference we are having a Berlin Buzzwords Barcamp in town. In addition, directly after the conference, Cloudera will be hosting Apache Hadoop trainings - registration is separate from Berlin Buzzwords.

So just in case you need a good excuse for a long term trip to Berlin: You can spend the weekend in town, attend the Barcamp on Sunday evening, visit Berlin Buzzwords on Monday and Tuesday. The rest of the week could be used to take part in Apache Hadoop trainings. Finally you have one weekend left for a city tour.

Thanks to Jan Lehnardt, Simon Willnauer and newthinking communications for co-organising the event.

Berlin Buzzwords - End of CfP drawing closer

2010-04-11 14:55
One week to go for submitting a talk on your favourite NoSQL topic, your favourite search application or your most interesting data analysis task: The call for presentations for Berlin Buzzwords ends on April 17th, that is Sunday next week.

Shortly after the last talk was submitted we will start announcing speakers - final list of speakers is to be expected by the start of May, final schedule will be published shortly after that.

Berlin Buzzwords - Early bird registration

2010-04-10 15:02
I would like to invite everyone interested in data storage, analysis and search to join us for two days on June 7/8th in Berlin for Berlin Buzzwords - an in-depth, technical, developer-focused conference located in the heart of Europe. Presentations will range from beginner friendly introductions on the hot data analysis topics up to in-depth technical presentations of scalable architectures.

Our intention is to bring together users and developers of data storage, analysis and search projects. Meet members of the development team working on projects you use. Get in touch with other developers you may know only from mailing list discussions. Exchange ideas with those using your software and get their feedback while having a drink in one of Berlin's many bars.

Early bird registration has been extended until April 17th - so don't wait too long.

If you would like to submit a talk yourself: Conference submission is open for little more than one week. More details are available online in the call for presentations:

Looking forward to meeting you in the beautiful, vibrant city of Berlin this summer for a conference packed with high profile speakers, awesome talks and lots of interesting discussions.

Seminar on scaling learning at DIMA TU Berlin

2010-03-17 21:10
Last Thursday the seminar on scaling learning problems took place at DIMA at TU Berlin. We had five students give talks.

The talks started with an introduction to map reduce. Oleg Mayevskiy first explained the basic concept, than gave an overview of the parallelization architecture and finally showed how jobs can be formulated as map reduce jobs.

His paper as well as his slides are available online.

Second was Daniel Georg - he was working on the rather broad topic of NoSQL databases. Being too fuzzy to be covered in one 20min talk, Daniel focussed on distributed solutions - namely Bigtable/HBase and Yahoo! PNUTS.

Daniel's paper as well as the slides are available online as well.

Third was Dirk Dieter Flamming on duplicate detection. He concentrated on algorithms for near duplicate detection needed when building information retrieval systems that work with real world documents: The web is full of copies, mirrors, near duplicates and documents made of partial copies. The important task is to identify near duplicates to not only reduce the data store but to potentially be able to track original authorship over time.

Again, paper and slides are available online.

After a short break, Qiuyan Xu presented ways to learn ranking functions from explicit as well as implicit user feedback. Any interaction with search engines provides valuable feedback about the quality of the current ranking function. Watching users - and learning from their clicks - can help to improve future ranking functions.

A very detailedpaper as well as slides are available for download.

Last talk was be Robert Kubiak on topic detection and tracking. The talk presented methods for identifying and tracking upcoming topics e.g. in news streams or blog postings. Given the amount of new information published digitally each day, these systems can help following interesting news topics or by sending notifications on new, upcoming topics.

Paper and slides are available online.

If you are a student in Berlin interested in scalable machine learning: The next course IMPRO2 has been setup. As last year the goal is to not only improve your skills in writing code but also to interact with the community and if appropriate to contribute back the work created during the course.

Call for presentations - Berlin Buzzwords

2010-03-11 15:09

Call for Presentations Berlin Buzzwords
Berlin Buzzwords 2010 - Search, Store, Scale
7/8 June 2010

This is to announce the opening of the Berlin Buzzwords 2010 call for presentations. Berlin Buzzwords is the first conference on scalable and open search, data processing and data storage in Germany, taking place in Berlin.

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • Information retrieval, search - Lucene, Solr, katta or comparable solutions
  • NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Closely related topics not explicitly listed above are welcome. We are looking for presentations on the implementation of the systems themselves, real world applications and case studies.

Important Dates (all dates in GMT +2)

  • Submission deadline: April 17th 2010, 23:59
  • Notification of accepted speakers: May 1st, 2010.
  • Publication of final schedule: May 9th, 2010.
  • Conference: June 7/8. 2010.

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Proposals should be submitted at no later than April 17th, 2010. Acceptance notifications will be sent out on May 1st. Please include your name, bio and email, the title of the talk, a brief abstract in English language. Please indicate whether you want to give a short (30min) or long (45min) presentation and indicate the level of experience with the topic your audience should have (e.g. whether your talk will be suitable for newbies or is targeted for experienced users.)

The presentation format is short: either 30 or 45 minutes including questions. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us.

Follow @hadoopberlin on Twitter for updates. News on the conference will be published on our website at

Program Chairs: Isabel Drost, Jan Lehnardt, and Simon Willnauer.

Schedule and further updates on the event will be published on

Early bird registration for Berlin Buzzwords on June 7th/8th open

2010-03-09 18:34
Silently registration was opened in the past days for Berlin Buzzwords - a conference on scaling search, data processing and storage taking place on June 7th/8th in Berlin/ Germany. First 100 tickets will be sold for 250 Euros + tax. Registration is possible at later dates as well, however expect prizes to rise shortly before the conference starts.

If you clicked on it earlier this week and were wondering what those strange German terms were all about: We have put online an English version as well, so language shouldn't be much of a problem anymore.

To avoid any confusion: Conference talks will be in English - no German language skills needed for that. It is perfectly well possible to get around in Berlin w/o speaking German, however knowing a few words as always will make it easier to make friends with people in shops and hotels ;)

Open Community Camp 2010

2010-02-12 13:07
The following information just reached my via Marten Vijn. Thought it might be interesting to you:

I am pleased to announce OpenCommunityCamp 2010.

The camp is from 10th to 18th July, in Oegstgeest, the Netherlands.

The website[1] is refreshed and the first speakers are booked.
It is time to register[2] if you plan be there (please do this

Currently we need to find more people to attend, self-organizing groups
for the day program and interesting speakers for the evening program.

I look forward to hear your ideas and plan if you come. If you have
any questions don't hesitate to mail me.

kind regards,
Marten Vijn


Berlin Buzzwords - June 2010

2010-02-11 22:42
As announced at FOSDEM: Early June (currently scheduled for 7th/8th) a conference on the topics scalable search, storage and processing will take place in Kalkscheune/Berlin. The conference is co-organised by newthinking store, Jan Lehnardt, Simon Willnauer, Thilo Fromm, and Isabel Drost.

The focus will be on NoSQL databases like CouchDB, Jackrabbit, MongoDB, HBase. Search tracks will cover topics like Lucene, Solr, katta and others. Data munging tracks will focus mainly on Hadoop, MapReduce in general and distributed systems.

More information including the call for presentations will be made available online next week on a separate webpage. Early registration starts in March. Watch this blog for more information or follow @hadoopberlin.

First NoSQL Meetup in Germany

2009-09-09 18:58
On October 22nd 2009 the first NoSQL Meetup Germany is going to take place in newthinking store/ Berlin:

Please submit your presentation proposals until September 22nd, accepted speakers will be notified soon after.

If you would like to sponsor the event, feel free to contact us: We would be very happy to provide videos after the event and free drinks for everyone during the event.

Hope to see you soon in Berlin.

June 2009 Apache Hadoop Get Together @ Berlin

2009-06-21 21:33
Just a brief reminder: Next week on Thursday the next Apache Hadoop Get Together is scheduled to take place in Berlin. There are quite a few interesting talks scheduled:

  • Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse
  • Christoph M. Friedrich, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI): "SCAIView - Lucene for Life Science Knowledge Discovery".
  • Uri Boness from JTeam in Amsterdam: Solr - From Theory to Practice.

See for more information.

For those interested in NOSQL Meetups, the discussion over at the NOSQL mailing list might be of interest to you: