What Data Mining Can and Can’t Do
June 15, 2007
Peter Fader, professor of marketing at University of Pennsylvania’s Wharton School, is the ultimate marketing quant—a world-class, award-winning expert on using behavioral data in sales forecasting and customer relationship management. He’s perhaps best known for his July 2000 (PDF) expert witness testimony before the U.S. District Court in San Francisco that Napster actually boosted music sales. (Napster was then the subject of an injunction for copyright infringement and other allegations brought against it by several major music companies.)
The energetic and engaging marketing professor has a pet peeve: He hates to see companies waste time and money collecting terabytes of customer data in attempts to make conclusions and predictions that simply can’t be made. Fader has come up with an alternative, which he is researching and teaching: Complement data mining with probability models, which, he says, can be surprisingly simple to create. The following is an edited version of his conversation with CIO Insight Executive Editor Allan Alter.
CIO INSIGHT: What are the strengths and weaknesses of data mining and business intelligence tools?
FADER: Data mining tools are very good for classification purposes, for trying to understand why one group of people is different from another. What makes some people good credit risks or bad credit risks? What makes people Republicans or Democrats? To do that kind of task, I can’t think of anything better than data mining techniques, and I think it justifies some of the money that’s spent on it. Another question that’s really important isn’t which bucket people fall into, but when will things occur? How long will it be until this prospect becomes a customer? How long until this customer makes the next purchase? So many of the questions we ask have a longitudinal nature, and I think in that area data mining is quite weak. Data mining is good at saying, will it happen or not, but it’s not particularly good at saying when things will happen.
Data mining can be good for certain time-sensitive things, like is this retailer the kind that would probably order a particular product during the Christmas season. But when you want to make specific forecasts about what particular customers are likely to do in the future, not just which brand they’re likely to buy next, you need different sets of tools. There’s a tremendous amount of intractable randomness to people’s behavior that can’t be captured simply by collecting 600 different explanatory variables about the customer, which is what data mining is all about.
People keep thinking that if we collect more data, if we just understand more about customers, we can resolve all the uncertainty. It will never, ever work that way. The reasons people, say, drop one cell phone provider and switch to another are pretty much random. It happens for reasons that can’t be captured in a data warehouse. It could be an argument with a spouse, it could be that a kid hurt his ankle in a ballgame so he needs to do something, it could be that he saw something on TV. Rather than trying to expand data warehouses, in some sense my view is to wave the white flag and say let’s not even bother trying.
Do you think people understand the limitations of data mining?
They don’t. And this has nothing to do with data mining or marketing, but it has a lot to do with human nature. We’re seeing the same issues arising in every area of science. As data collection technology and model-building capabilities get better, people keep thinking they can answer the previously unknowable questions. But whether it’s the causes of diseases or mechanical failure, there’s only so much we can pin down by capturing data.
Do people who use data mining packages understand enough about how to use them?
I can’t make generalizations that are too broad, but there are some people who are hammers looking for nails. They think they can answer any problem using one set of procedures, and that’s a big mistake. When you go into other domains, you need to pull out different tools. One of the things that just makes me crazy is when people misuse the kinds of statistics that are associated with data mining. A lift curve will show us how well our predicted rank order of customer propensities corresponded to their actual behavior. That’s a fine thing to do in a classification setting, but it’s not particularly diagnostic in a longitudinal setting. We want ‘when’-type diagnostics to answer ‘when’-type questions. People just aren’t looking in the right places to see whether their model’s working.
Exactly what do you mean by a propensity as opposed to a behavior?
The difference is that just because people have a tendency to do things doesn’t mean that they will. You might be someone who buys from Amazon once a month on average. Does that mean over the next 10 years, over the next 120 months, you’ll buy 120 items? No. You could go two years without buying, or you might buy five items in a given month. The amount of variability around your propensity is huge. That’s where all this randomness comes in.
Have companies hurt themselves by misusing data mining tools?
Let me start with a positive example. I have tremendous admiration for what actuaries do, and therefore for the way insurance companies deal with their customers. Actuaries will not look at all your characteristics and say when you will die. They’ll simply come up with a probabilistic statement about the likelihood that someone with your characteristics will die, or what percent of people who share characteristics will live to be 70. They understand that it’s pretty much impossible to make statements about each and every policyholder.
Now, carry that over to the marketing world. Lots of firms talk about one-to-one marketing. I think that’s a real disservice to most industries. One-to-one marketing only works when you have a very deep relationship with every customer. So one-to-one marketing works great in private wealth management, or in a business-to-business setting where you meet with the client at least once a month, and understand not just their business needs but what’s going on in their life. But in areas approaching a mass market, where you can’t truly distinguish each individual, you just have a bunch of people and a bunch of characteristics that describe them. Then the notion of one-to-one marketing is terrible. It will do more harm than good, because the customers will act more randomly than you expect, and the cost of trying to figure out what specific customers will do far outweighs the benefits you could get from that level of detail.
It’s very hard to say who’s going to buy this thing and when. To take that uncertainty and square it by looking across two products, or to raise it to the nth power by looking across a large portfolio of products, and say “these two go together,” and make deterministic statements as opposed to talking about tendencies and probabilities, can be very, very harmful. It’s much more important for companies to come up with appropriate groupings of similar people, and make statements about them as a group.
I don’t want to pick on Amazon in particular; they really tout the capabilities of their recommendations systems. But maybe this customer was going to buy book B anyway, and therefore all the recommendations were irrelevant. Or maybe they were going to buy book C, which would have been a higher-margin item, so getting them to buy book B was a mistake. Or maybe they’re becoming so upset by irrelevant recommendations that they’re going away entirely. I don’t want in any way to suggest that cross-selling shouldn’t be done, but what I’m suggesting is that the net gains from it are less than people might think. It often can’t justify the kinds of investments that firms are making in it.
You’ve been championing the use of probability models as an alternative to data mining tools. What do you mean by a probability model?
Probability models are a class of models that people used back in the old days when data weren’t abundantly available. These modeling procedures are based on a few premises: People do things in a random manner; the randomness can be characterized by simple probability distributions; and the propensities for people to do things vary-over time, across people, across circumstances. Probably the best known example is survival analysis, which stems largely from the actuary sciences. It’s also used in manufacturing. You put a bunch of lightbulbs on a testing board and see how long they last. In many ways, that’s what I suggest we do with customers. We’re not going to make statements about any one lightbulb, just like we shouldn’t make statements about any one customer. We’ll make collective statements about how many of these bulbs will last for 1,000 hours. It turns out that the analogy of survival analysis in manufacturing and actuarial and life sciences carries over amazingly well to customers. A lot of managers would bristle at the idea, but I think that metaphor is far better than all this excessive customization and personalization that’s been going on. Customers are different from each other just as lightbulbs are, but for reasons that we can’t detect, and reasons that we’ll have a very hard time taking advantage of.
What kind of problems can probability models solve?
Probability models have three basic building blocks: One is timing-how long until something happens. One is counting-how many arrivals, how many purchases or whatever will we see over a given period of time. And choice-given an opportunity to do something, how many people will choose to do it. That’s it. Most real-world business problems are just some combination of those building blocks jammed together. For instance, if you’re modeling the total time someone spends at a Web site during a given month, you might model it as counting-timing: a count model for the number of visits and a timing model for the duration of each one. My view is that we can very easily build simple models in Excel for each of those three things. A lot of people have built this kind of model over the years, and have tested them very carefully, in some cases putting them directly up against data mining procedures. They have found that their capabilities are not only astonishing, but far better than data mining. If you think about all the different ways you can combine timing, counting and choice, you can tell all kinds of interesting stories about different business situations.
How would you use these models to identify the most profitable customers or calculate customer lifetime value?
This is where probability models can come together beautifully with data mining. We can use these models to come up with very accurate forecasts about how long this customer will stay with us or how many purchases they’ll make over the next year. So use the basic probability model to capture the basic behavior and then bring in data mining to understand why groups of customers with different behavioral tendencies are different from each other. You see, behavior itself is not perfectly indicative of the true underlying propensities, which is what managers really want to know. And so we build a probability model that helps us uncover the propensities, and then we can take those propensities-the customer’s tendency to do something quickly or slowly or to stay online a long time or not-and throw those into the data mining engine to explain those as a function of the 600 variables. You’ll find a much more satisfying and fruitful explanation in terms of being able to profile new customers and understand the likely actions of current ones. When it comes to taking the outputs of the probability model and understanding them, data mining procedures are the best way to go.
Can probability models capture longitudinal or predictive information?
Very, very well. In fact, one of my favorite examples is looking at customer retention and return. You can do it simply without any explanatory variables at all. The irony is that if you bring in explanatory variables, in many cases the model will do worse. This makes managers crazy. They need to know why these people are different. But if you’re bringing in explanatory variables that aren’t really capturing the true underlying reasons for the differences, then you’re just adding noise to the system. Your ability to come up with an accurate forecast for each group might actually be worse.
So you use data mining to help you figure out why those propensities exist.
That’s right. The key is to explain the propensities-the tendency to do things-as opposed to the behavior itself.
You said these models can be built in a spreadsheet. It doesn’t sound like you have to be a high-powered Ph.D. to create them.
Of course, that never hurts. But yes, these models are far more transparent to managers because the stories they tell are simpler, the demands on the data are far simpler, and the implementation is much easier. So what I like to do is to start people out with some of the really simple models and get people hooked. Show me how many customers we’ve had in year one, two, three, four, five, and I’ll tell you how many we’ll have in year nine and ten before we even bring in all the explanatory variables that data miners want to do.
If companies move to using models more, what data can they stop collecting and what data will they still need to collect?
Ultimately, what matters most is behavior. That shouldn’t be a controversial statement, but a tremendous amount of the data that’s being collected is nonbehavioral. Data on demographics, psychographics, socioeconomics and even consumer attitudes can not only waste servers and storage space but can actually make the models perform worse. I have lots of examples of data that leads to tremendously misleading inferences about what really matters.
So behavior’s what matters most, and even then you can often summarize behavior in very simple ways. For instance, in many cases we find that you don’t even need to know exactly when each transaction occurred to make forecasts. Simply give me summary statistics, such as frequency. Just tell me when was the last time they made a purchase and how many purchases they made over the last year, and that will explain pretty much everything worth explaining. You mentioned that a CIO Insight survey found that the amount of customer data companies are collecting is increasing at an annual rate of about 50 percent. I would claim that most of that 50 percent is completely wasted. It’s one thing to have 50 percent more data, but you’re certainly not getting 50 percent more knowledge or insight. In fact, you could be doing more harm than good, because you’re crowding out the few variables that really do matter.
What companies have done a good job of using models this way?
I wish I could put some companies on a pedestal, but I’ve never seen a firm really embrace this stuff as fully as I’d like. And I’ll tell you why: It’s really my fault. It’s the fault of academics who spend almost no time teaching these procedures. Most firms just aren’t getting exposed to this stuff.
What should CIOs do to help their companies use analytical and modeling tools appropriately?
For one thing, remember that more is not necessarily better. CIOs often push back on analytics because of cost, but if someone could give them all this additional data for free, they’d take it. That’s often wrong. Additional data can actually harm you because you’re going to start capturing random, quirky, idiosyncratic things that aren’t related to the true underlying propensities. The flipside is that a few simple measures that have been around forever, like recency and frequency, are all you need. If you can use data collection technology to get those measures more accurately or on a timelier basis, then maybe it’s worth the investment. Second, remember that some surprisingly simple models can take you incredibly far if you’re willing to not worry so much about drivers. Don’t bother looking for the drivers; first, capture the behavior. So start simple; that often means start in Excel. You’d be amazed at how much you can accomplish without even having to leave the spreadsheet environment.