Big data is too big, be more specific

This post is inspired by a tweet I made on November 12, which I’m pasting here: “Altho also guilty of this, I’m tired of hearing the buzz words gamification & big data. I promise to talk specifics – please do the same 🙂 ” I’ve stuck to my promise, and am now going to address why I believe we need to be more specific when talking about big data. It’s been over a month and a half since I made the comment so my thoughts on this topic must have grown subconsciously for quite some time. If something similar has been happening with the part of the comment addressing gamification, you’re also likely to see a post on that topic soon.

The term “big data” is simply a way to highlight the tremendous amount of data which is being generated in today’s world. Mobile phones, sensors, and cameras are just some of the devices which create this data. The proliferation of these devices, coupled with the rapid rise in their computing power, is producing reams of complex data sets. The objective of big data startups is to analyze these data sets so as to extract insights from otherwise meaningless pools of information. This is where things get tricky. What is insightful depends largely on what you’re trying to do. A bank manager has very different needs than a soccer coach or the manager of a data processing center (each of these is the target customer of a big data startup that I’ve recently evaluated). In order to meet these needs, big data startups need to understand what factors influence the performance of the industry which they’re serving, and how these factors interact with each other to produce insightful information. If you’re familiar with regression models, you can think of this as identifying the right set of explanatory variables to track and determining their relative impact on the response variable which you’re trying to optimize.

The core issue is that it takes a lot of industry experience to really understand the drivers of a particular business. Take the example of the bank manager who, among other goals, is looking to optimize the amount of cash on hand at a particular branch. Hold too much cash and you forego valuable interest. Hold too little and you may not be able to meet customer withdrawal requests. It takes years of experience to know which factors you should track to determine how much cash you are likely to need on any particular day.

If you’re an industry outsider, you won’t know this information yourself. Speaking to the bank manager is one way to get the information. However, this approach has two drawbacks. First, you’re likely to lose or not understand valuable information which the bank manager is conveying due to your recent introduction to the industry. Second, the bank manager may be overlooking an important variable which, if tracked, would greatly improve the quality of the assessment. However, since you’re not familiar with the industry, you’re unlikely to recognize this. The solution to both these problems is to be an industry insider. To build a big data startup which improves the quality of the critical decisions which your target customer makes, you need to have years of experience walking in his shoes.

The best big data startups are not those filled exclusively with data scientists who have the technical ability to manipulate spreadsheets containing millions of rows and columns. Although this technical ability is valuable, it needs to be complemented with a deep understanding of the industry which you’re trying to serve. What matters, what doesn’t, and how can we analyze data to improve what matters? The best big data startups are those where the founders and many team members have years of experience in the industry which they’re serving. They have their own answers to each of these questions, and are able to engage in constructive dialogues with their customers to arrive at even better answers. It’s for this reason that saying that you’re a big data startup isn’t enough. You need to be specific about why your team is uniquely positioned to understand your target customer better than anyone else, and provide examples of insightful analyses which support this claim.