Apache brand name - survey

2010-02-16 19:03
Sally Khudairi (VP, ASF Marketing & Publicity) asked for distributing the following survey to people who might be interested in contributing their views to a study on how the brand name Apache is perceived. Me personally, I would be especially interested in finding out more on whether there are any differences in perception inside the ASF vs. outside...


We have been working with PhD candidate Roland Schroll over the past two years as he's been compiling information on the value of the Apache brand. His advisor is community-based innovation expert Dr. Johann Füller. This is a joint project of the University of Innsbruck and the Massachusetts Institute of Technology.

If you have 10 minutes to help, it would be much appreciated. The survey is at http://surveys.hyvelive.de/10_apache/p1.php?refGroup=Apache

They would like the surveys to be completed this month (February).

They are seeking at least 300 respondents. As such, if you know others who are interested in Apache from a market perspective, feel free to forward the link to them as well.

Thanks in advance for your interest!

Kind regards,
Sally Khudairi
VP, ASF Marketing & Publicity


Apache Dinner January 2010

2010-01-18 22:48
This evening in X-Berg several local committers met for the second "Apache Dinner" - an informal gathering of local Apache committers, friends and associates for food, beer and interesting discussions. Next one is probably to be scheduled some time in February. Feel free to send a message to Torsten Curdt to be included on the next invitation mail. Thanks for organizing a nice evening, Torsten. Hope to see even more Apache friends at the next dinner ;)

First December Apache Hadoop Berlin video online

2009-12-31 20:27
The video of Nikolaus' Pohle's talk at the December Apache Hadoop Get Together Berlin is online already - more to come soon.

hadoop nikolaus pohle from Isabel Drost on Vimeo.



Thanks to Martin from newthinking for video taping and uploading. Thanks to StudiVZ for sponsoring the video.

On Thursday: Open Hadoop User Group Munich

2009-12-16 06:06
If one evening of Apache Hadoop is not enough for you: The next Christmas Meetup in Germany takes place one day later in Munich.


  • When: Thursday December 17, 2009 at 5:30pm open end
  • Where: eCircle AG, Nymphenburger Straße 86, 80636 München ("Bruckmann" Building, "U1 Mailinger Str", map in German http://www.ecircle.com/de/kontakt/anfahrt.html and look for the signs)


Talks scheduled by Bob and Lars:

Bob Schulze from eCircle will be giving the first presentation on how eCircle is planning to use the Hadoop stack.

Dave Butlerdi will be giving an overview of his usage of Hadoop.

Lars George will give a state of affairs of the HBase project. What is it, what does it do and how he is using it (since early 2008).

There is a quick connect via train from Berlin to Munich. So if you are attending the Berlin Get Together, it is very easy to travel south to Munich one day later and visit the Munich event as well.

Apache Hadoop at FOSDEM 2010

2009-12-11 09:19
Though the official schedule is not yet online: I will be giving an introductory talk about Apache Hadoop at next year's FOSDEM (Free and Open Source Developer European Meeting) in Brussles. This will be the 10th birthday of the event - looking forward to a fun event, meeting other free and open source software developers from all over Europe.





If you are a Apache Hadoop developer and would like me to include some particular topic in the talk - please feel free to contact me. If you are an Apache Hadoop user and would like to learn more on the project, please come to the talk and ask questions. If you are an Apache Hadoop Newbie - feel free to join us.

In addition there will be a NoSQL Dev Room at FOSDEM as well. The call for presentations is up already. So if you are doing fun stuff with CouchDB, HBase and friends or are a developer of these projects - submit a talk and join us in early-February in Brussles.

First Apache Dinner Berlin

2009-11-25 02:33
A few days ago, I received a mail from Torsten Curdt that read something like: "[...] For a long time now I wanted to organise an Apache Dinner Berlin. What do you think, when would be a good time for that?". As that was about the third time I heard of that idea (and the third person mentioning the idea), I included some Berlin-based Apache-people asking whether they would be interested in having an Apache Dinner on November 24st in X-Berg. General answer: Yes! Sure!

The idea was to make it open to anyone interested in the ASF and send invitations to committers who are living in the greater-Berlin-area. Then book a table, have some food, get some drinks...

We met at Graefekiez - we, that is Torsten (Jakarta and Hadoop), Jan and Daniel (CouchDB), Simon+Vera (Lucene), oswald (xampp), Eric (Http Components) and myself - for a great "small menu" at La Buona Forchetta (Thanks to Torsten for coming up with that restaurant and booking the table). After that some of us moved over to a bar close to the restaurant.

After a long evening with lots of interesting (cross-project as well as non-technical) discussions, the general conclusion was to organize another Apache Dinner some time in January after Christmas-time is over:

Thanks guys for a great evening. Hope to see you all - as well as a few more Apache people from around Berlin - in January. Date and location to be set.

Final note to self: No Club Mate for Isabel after 02:00 a.m. ...

ApacheCon Oakland Roundup

2009-11-19 20:15
Two weeks ago ApacheCon US 2009 ended in Oakland California. Shane published a set of links to articles that contain information on what happened at Apache Con. Some of them are officially published by the Apache PRC project, others are write-ups of individuals on which talks they attended and which topics they considered particularly interesting.

Mahout 0.2 released

2009-11-18 10:52
Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/


Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include


  • Major performance enhancements in Collaborative Filtering, Classification and Clustering
  • New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
  • New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
  • New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
  • New: HBase storage support for Naive Bayes model building and classification
  • New: Generation of vectors from Text documents for use with Mahout Algorithms
  • Performance improvements in various Vector implementations
  • Tons of bug fixes and code cleanup


Getting started: New to Mahout?



For more information on Apache Mahout, see http://lucene.apache.org/mahout

A very BIG Thank You to all those who made this release happen!

Open Source Expo 09

2009-11-16 22:17
I spent last Sunday and the following Monday at Open Source Expo Karlsruhe - co-located with web-tech and php-conference organized by the Software-and-Support Verlag. Together with Simon Willnauer I ran the Lucene/Mahout booth at the expo.

So far the conference is still very small (about 400 visitors) compared to free software community events. However the focus was set to be more on professional users, accordingly several projects showed that free software can be used successfully for various business use cases. Visitors were invited to ask Sun about their free software strategy. Questions concerning OpenJDK or MySQL were not uncommon. Large distributors like SuSE or Mandriva were present as well. But also smaller companies e.g. providing support for Apache OfBIZ were present.

The Apache Lucene project was invited as exhibitor as well. Together with PRC and ConCom we organized for an Apache banner. Lucid Imagination sponsored several Lucene T-Shirts to be distributed at the conference. At the very last minute information (abstract, links to projects and mailing lists and current users) was put together on flyers.

We arrived on Saturday, late evening. Together with a friend of mine we went for some indian food at a really good restaurant close to the hotel. Big thanks to her, for being our tourist guide - hope to see you back in Waldheim in December ;)



Sunday was pretty quiet - only few guests arrived at the weekend. I was invited by David Zuelke to give a brief introduction to Mahout during his MapReduce Hadoop tutorial workshop. Thanks, David. Though lunch was served already, people did stay to hear my presentation on large scale machine learning with Mahout. I got contacted by one of the students of Katarina Morik who was pretty interested in the project. Back at her research group people are working on Rapid Miner - a tool for easy machine learning. It comes with a graphical user interface that makes it simple to explore various algorithm configurations and data workflow setups. It would be interesting to see how this tool helps people to understand machine learning. Would also be very interesting to learn what form of contribution might be interesting and appropriate for research groups to contribute to Mahout. Maybe not code-wise but more in terms of discussions and background knowledge.

Sunday was a bit more busy, with more people attending the conferences. Simon got a slot to present Lucene at the Open Stage track and show off the new features of Lucene 2.9. Those using Lucene already could be tricked into telling their Lucene success-story at the beginning of the talk. At the booth we had a wide variety of people: From students trying to find a crawling and indexing system for their information retrieval course homework up to professionals with various questions on the Apache Lucene project. The experience of people at the conference varied widely. That proved to be a pretty good reality-check. Being part of the Lucene and the ASF community one might be tempted to think that not knowing about Lucene is almost impossible. Well, it seems to be less impossible than at least I expected.

One last success: As the picture shows, Yacy now is powered by Lucene as well - at least in terms of T-Shirt ;)

Apache Con US Wrap Up

2009-11-16 22:10
some weeks ago I attended ApacheConUS09 in Oakland/ California. In the mean time, videos of one of the sessions have been published online:

You can find a wrap up of the most prominent topics at the conference at heise (unfortunately Germany-only).

By far the largest topics at the conference:
  • Lucene - there was a meetup with over 100 attendees as well as two main tracks with Lucene focussed talks. New features of Lucene 2.9.* were in the center of interest: The new range search capabilities, segment search that improves caching, a new token stream api that makes annotating terms more flexible as well as a lot of performance improvements. Shortly after the conference, Lucene 2.9.1 as well as Solr 1.4 was released so end-users switching to the new version now benefit from better performance and several new features.
  • Hadoop - large scale data processing currently is one of the biggest topics. Be it logfile analysis, business intelligence or ad-hoc analysis of user data. Hadoop was covered by a user meetup as well as one track on the first conference day. The track started with an introduction by Owen O'Malley and Doug Cutting. It continued with talks on HBase, Hive, Pig and other projects from the Hadoop ecosystem.


But also projects like Apache Tomcat and Apache HTTPD were well covered within one to two sessions each.

Currently a hot topic within the foundation is the challenge of bringing the community together face-to-face. Apache projects have become so numerous that covering them all within 3+2 days of conference and trainings seems no longer feasable. One way to mitigate these problems might be to motivate people to do more local meetups potentially supported by ConCom as has already happened in the Lucene- and Hadoop-communities. A related topic is the task of community building and community growth within the ASF. Google Summer of Code has been a great way to integrate new people. However the model does not scale that well for the foundation. With ComDev a new project was founded with the goal to work on community development issues, talking to research, getting students into open source early on. The project is largely supported by Ross Gardler, who already has experience with teaching and promoting open source and free software in the research context being part of the open source watch project in the UK.

Apache Con US 09 brought together a large community of Apache software developers and users from all over the world who gathered in California, not only for the talks but also for face-to-face communication, coding together and exchanging ideas.

Update: Slides of my Mahout talk are now online.