O'Reilly Strata - Tutorial data analytics


Posted:   |   More posts about SFO strataconf Event Data Analytics General

Acting based on data

It comes as no surprise to hear that also in the data analytics world engineers are unwilling to share details of how their analysis works with higher management - with on the other side not much interest on learning how analytics really works. This culture leads to a sort of black art, witch craft attitude to data analytics that hinders most innovation.

When starting to establish data analytics in your business there are a few steps to consider: Frist of all no matter how beautiful visualizations may look on the tool you just chose to work with and are considering to buy - keep in mind that shiny pebbles won't solve your problems. Instead focus on what kind of information you really want to extract and chose the tool that does that job best. Keep in mind that data never comes as clean as analysts would love it to be.

  • Ask yourself how complete your data really is (Are all fields you are looking at filled for all relevant records?).
  • Are those fields filled with accurate information (Ever asked yourself why everyone using your registration form seems to be working for a 1-100 engineers startup instead of one of the many other options down the list?)
  • For how long will that data remain accurate?
  • For how long will it be relevant for your business case?

Even the cleanest data set can get you only so far: You need to be able to link your data back to actual transactions to be able to segment your customers and add value from data analytics.

When introducing data analytics check whether people are actually willing to share their data. Check whether management is willing to act on potential results - that may be as easy as spending lots of money on data cleansing, or it may involve changing workflows to be able to provide better source data. As a result of data analytics there may be even more severe changes ahead of you: Are people willing to change the product based on pure data? Are they willing to adjust the marketing budget? ... job descriptions? ... development budget? How fast is the turnaround for these changes? When making changes yearly there is no value in having realtime analytics.

In the end it boils down to applying the OODA cycle: If you can be faster observing, orienting, deciding and acting than your competitor only then do you have a real business advantage.

Data analytics ethics

Today Apache Hadoop provides the means to give data analytics super powers to everyone: It brings together the use of commodity hardware with scaling to high data volumns. With great power there must come great great responsibility according to Stan Lee. In the realm of data science that involves solving problems that might be ethically at least questionable though technologically trivial:

  • Helping others adjust their weapons to increase death rates.
  • Making others turn into a monopoly.
  • Predict the likelihood of cheap food making you so sick that you are able and willing to go to court against the provider as a result.

On the other hand it can solve cases that are mutually sensible both for the provider and the customer: Predicting when visitors to a casino are about to become unhappy and willing to leave before the even know they are may give the casino employees a brief time window for counter actions (e.g. offering you a free meal).

In the end it boils down to avoiding to screw up other people's lifes. Deciding which action does least harm while achieving most benefit. Which treats people at least proportional if not equal, what serves the community as a whole - or more simply: What leads me to being the person I always wanted to be.