Tag Archives: Blog Post

ECC 6.0 Release Notes

Release Note link for ECC 6.0 – Updated 032708



What Data Mining Can and Can’t Do

What Data Mining Can and Can’t Do

Peter Fader, professor of marketing at University of Pennsylvania’s Wharton School, is the ultimate marketing quant—a world-class, award-winning expert on using behavioral data in sales forecasting and customer relationship management. He’s perhaps best known for his July 2000 (PDF) expert witness testimony before the U.S. District Court in San Francisco that Napster actually boosted music sales. (Napster was then the subject of an injunction for copyright infringement and other allegations brought against it by several major music companies.)

The energetic and engaging marketing professor has a pet peeve: He hates to see companies waste time and money collecting terabytes of customer data in attempts to make conclusions and predictions that simply can’t be made. Fader has come up with an alternative, which he is researching and teaching: Complement data mining with probability models, which, he says, can be surprisingly simple to create. The following is an edited version of his conversation with CIO Insight Executive Editor Allan Alter.

CIO INSIGHT: What are the strengths and weaknesses of data mining and business intelligence tools?

FADER: Data mining tools are very good for classification purposes, for trying to understand why one group of people is different from another. What makes some people good credit risks or bad credit risks? What makes people Republicans or Democrats? To do that kind of task, I can’t think of anything better than data mining techniques, and I think it justifies some of the money that’s spent on it. Another question that’s really important isn’t which bucket people fall into, but when will things occur? How long will it be until this prospect becomes a customer? How long until this customer makes the next purchase? So many of the questions we ask have a longitudinal nature, and I think in that area data mining is quite weak. Data mining is good at saying, will it happen or not, but it’s not particularly good at saying when things will happen.

Data mining can be good for certain time-sensitive things, like is this retailer the kind that would probably order a particular product during the Christmas season. But when you want to make specific forecasts about what particular customers are likely to do in the future, not just which brand they’re likely to buy next, you need different sets of tools. There’s a tremendous amount of intractable randomness to people’s behavior that can’t be captured simply by collecting 600 different explanatory variables about the customer, which is what data mining is all about.

People keep thinking that if we collect more data, if we just understand more about customers, we can resolve all the uncertainty. It will never, ever work that way. The reasons people, say, drop one cell phone provider and switch to another are pretty much random. It happens for reasons that can’t be captured in a data warehouse. It could be an argument with a spouse, it could be that a kid hurt his ankle in a ballgame so he needs to do something, it could be that he saw something on TV. Rather than trying to expand data warehouses, in some sense my view is to wave the white flag and say let’s not even bother trying.

Do you think people understand the limitations of data mining?

They don’t. And this has nothing to do with data mining or marketing, but it has a lot to do with human nature. We’re seeing the same issues arising in every area of science. As data collection technology and model-building capabilities get better, people keep thinking they can answer the previously unknowable questions. But whether it’s the causes of diseases or mechanical failure, there’s only so much we can pin down by capturing data.

Do people who use data mining packages understand enough about how to use them?

I can’t make generalizations that are too broad, but there are some people who are hammers looking for nails. They think they can answer any problem using one set of procedures, and that’s a big mistake. When you go into other domains, you need to pull out different tools. One of the things that just makes me crazy is when people misuse the kinds of statistics that are associated with data mining. A lift curve will show us how well our predicted rank order of customer propensities corresponded to their actual behavior. That’s a fine thing to do in a classification setting, but it’s not particularly diagnostic in a longitudinal setting. We want ‘when’-type diagnostics to answer ‘when’-type questions. People just aren’t looking in the right places to see whether their model’s working.

Exactly what do you mean by a propensity as opposed to a behavior?

The difference is that just because people have a tendency to do things doesn’t mean that they will. You might be someone who buys from Amazon once a month on average. Does that mean over the next 10 years, over the next 120 months, you’ll buy 120 items? No. You could go two years without buying, or you might buy five items in a given month. The amount of variability around your propensity is huge. That’s where all this randomness comes in.

Have companies hurt themselves by misusing data mining tools?

Let me start with a positive example. I have tremendous admiration for what actuaries do, and therefore for the way insurance companies deal with their customers. Actuaries will not look at all your characteristics and say when you will die. They’ll simply come up with a probabilistic statement about the likelihood that someone with your characteristics will die, or what percent of people who share characteristics will live to be 70. They understand that it’s pretty much impossible to make statements about each and every policyholder.

Now, carry that over to the marketing world. Lots of firms talk about one-to-one marketing. I think that’s a real disservice to most industries. One-to-one marketing only works when you have a very deep relationship with every customer. So one-to-one marketing works great in private wealth management, or in a business-to-business setting where you meet with the client at least once a month, and understand not just their business needs but what’s going on in their life. But in areas approaching a mass market, where you can’t truly distinguish each individual, you just have a bunch of people and a bunch of characteristics that describe them. Then the notion of one-to-one marketing is terrible. It will do more harm than good, because the customers will act more randomly than you expect, and the cost of trying to figure out what specific customers will do far outweighs the benefits you could get from that level of detail.

It’s very hard to say who’s going to buy this thing and when. To take that uncertainty and square it by looking across two products, or to raise it to the nth power by looking across a large portfolio of products, and say “these two go together,” and make deterministic statements as opposed to talking about tendencies and probabilities, can be very, very harmful. It’s much more important for companies to come up with appropriate groupings of similar people, and make statements about them as a group.

I don’t want to pick on Amazon in particular; they really tout the capabilities of their recommendations systems. But maybe this customer was going to buy book B anyway, and therefore all the recommendations were irrelevant. Or maybe they were going to buy book C, which would have been a higher-margin item, so getting them to buy book B was a mistake. Or maybe they’re becoming so upset by irrelevant recommendations that they’re going away entirely. I don’t want in any way to suggest that cross-selling shouldn’t be done, but what I’m suggesting is that the net gains from it are less than people might think. It often can’t justify the kinds of investments that firms are making in it.

You’ve been championing the use of probability models as an alternative to data mining tools. What do you mean by a probability model?

Probability models are a class of models that people used back in the old days when data weren’t abundantly available. These modeling procedures are based on a few premises: People do things in a random manner; the randomness can be characterized by simple probability distributions; and the propensities for people to do things vary-over time, across people, across circumstances. Probably the best known example is survival analysis, which stems largely from the actuary sciences. It’s also used in manufacturing. You put a bunch of lightbulbs on a testing board and see how long they last. In many ways, that’s what I suggest we do with customers. We’re not going to make statements about any one lightbulb, just like we shouldn’t make statements about any one customer. We’ll make collective statements about how many of these bulbs will last for 1,000 hours. It turns out that the analogy of survival analysis in manufacturing and actuarial and life sciences carries over amazingly well to customers. A lot of managers would bristle at the idea, but I think that metaphor is far better than all this excessive customization and personalization that’s been going on. Customers are different from each other just as lightbulbs are, but for reasons that we can’t detect, and reasons that we’ll have a very hard time taking advantage of.

What kind of problems can probability models solve?

Probability models have three basic building blocks: One is timing-how long until something happens. One is counting-how many arrivals, how many purchases or whatever will we see over a given period of time. And choice-given an opportunity to do something, how many people will choose to do it. That’s it. Most real-world business problems are just some combination of those building blocks jammed together. For instance, if you’re modeling the total time someone spends at a Web site during a given month, you might model it as counting-timing: a count model for the number of visits and a timing model for the duration of each one. My view is that we can very easily build simple models in Excel for each of those three things. A lot of people have built this kind of model over the years, and have tested them very carefully, in some cases putting them directly up against data mining procedures. They have found that their capabilities are not only astonishing, but far better than data mining. If you think about all the different ways you can combine timing, counting and choice, you can tell all kinds of interesting stories about different business situations.

How would you use these models to identify the most profitable customers or calculate customer lifetime value?

This is where probability models can come together beautifully with data mining. We can use these models to come up with very accurate forecasts about how long this customer will stay with us or how many purchases they’ll make over the next year. So use the basic probability model to capture the basic behavior and then bring in data mining to understand why groups of customers with different behavioral tendencies are different from each other. You see, behavior itself is not perfectly indicative of the true underlying propensities, which is what managers really want to know. And so we build a probability model that helps us uncover the propensities, and then we can take those propensities-the customer’s tendency to do something quickly or slowly or to stay online a long time or not-and throw those into the data mining engine to explain those as a function of the 600 variables. You’ll find a much more satisfying and fruitful explanation in terms of being able to profile new customers and understand the likely actions of current ones. When it comes to taking the outputs of the probability model and understanding them, data mining procedures are the best way to go.

Can probability models capture longitudinal or predictive information?

Very, very well. In fact, one of my favorite examples is looking at customer retention and return. You can do it simply without any explanatory variables at all. The irony is that if you bring in explanatory variables, in many cases the model will do worse. This makes managers crazy. They need to know why these people are different. But if you’re bringing in explanatory variables that aren’t really capturing the true underlying reasons for the differences, then you’re just adding noise to the system. Your ability to come up with an accurate forecast for each group might actually be worse.

So you use data mining to help you figure out why those propensities exist.

That’s right. The key is to explain the propensities-the tendency to do things-as opposed to the behavior itself.

You said these models can be built in a spreadsheet. It doesn’t sound like you have to be a high-powered Ph.D. to create them.

Of course, that never hurts. But yes, these models are far more transparent to managers because the stories they tell are simpler, the demands on the data are far simpler, and the implementation is much easier. So what I like to do is to start people out with some of the really simple models and get people hooked. Show me how many customers we’ve had in year one, two, three, four, five, and I’ll tell you how many we’ll have in year nine and ten before we even bring in all the explanatory variables that data miners want to do.

If companies move to using models more, what data can they stop collecting and what data will they still need to collect?

Ultimately, what matters most is behavior. That shouldn’t be a controversial statement, but a tremendous amount of the data that’s being collected is nonbehavioral. Data on demographics, psychographics, socioeconomics and even consumer attitudes can not only waste servers and storage space but can actually make the models perform worse. I have lots of examples of data that leads to tremendously misleading inferences about what really matters.

So behavior’s what matters most, and even then you can often summarize behavior in very simple ways. For instance, in many cases we find that you don’t even need to know exactly when each transaction occurred to make forecasts. Simply give me summary statistics, such as frequency. Just tell me when was the last time they made a purchase and how many purchases they made over the last year, and that will explain pretty much everything worth explaining. You mentioned that a CIO Insight survey found that the amount of customer data companies are collecting is increasing at an annual rate of about 50 percent. I would claim that most of that 50 percent is completely wasted. It’s one thing to have 50 percent more data, but you’re certainly not getting 50 percent more knowledge or insight. In fact, you could be doing more harm than good, because you’re crowding out the few variables that really do matter.

What companies have done a good job of using models this way?

I wish I could put some companies on a pedestal, but I’ve never seen a firm really embrace this stuff as fully as I’d like. And I’ll tell you why: It’s really my fault. It’s the fault of academics who spend almost no time teaching these procedures. Most firms just aren’t getting exposed to this stuff.

What should CIOs do to help their companies use analytical and modeling tools appropriately?

For one thing, remember that more is not necessarily better. CIOs often push back on analytics because of cost, but if someone could give them all this additional data for free, they’d take it. That’s often wrong. Additional data can actually harm you because you’re going to start capturing random, quirky, idiosyncratic things that aren’t related to the true underlying propensities. The flipside is that a few simple measures that have been around forever, like recency and frequency, are all you need. If you can use data collection technology to get those measures more accurately or on a timelier basis, then maybe it’s worth the investment. Second, remember that some surprisingly simple models can take you incredibly far if you’re willing to not worry so much about drivers. Don’t bother looking for the drivers; first, capture the behavior. So start simple; that often means start in Excel. You’d be amazed at how much you can accomplish without even having to leave the spreadsheet environment.

Copyright (c) 2007 Ziff Davis Media Inc. All Rights Reserved.


This site has a useful compilation of SAP information for ABAP and BW developers.

Swivel – Wide Open Data Exploration Community

Swivel – Wide Open Data Exploration Community

OECD and Swivel Invite Curious People to Explore, Discuss and Debate the OECD Factbook

SAN FRANCISCO, CA — (MARKET WIRE) — April 16, 2007 — Swivel, a data exploration Web site for curious people, today announced that The Organisation for Economic Co-operation and Development (OECD) will make its 2007 OECD Factbook available on Swivel’s site, http://www.swivel.com. The OECD is an intergovernmental organization that facilitates discussion among its member countries on economic, social and environmental issues. Now, inquisitive people can easily obtain the most accurate and current set of economic, social and environmental indicators worldwide and discuss them openly with a community of interested peers. Read the full release

Swivel in Nature

Declan Butler of the journal Nature, wrote an article about Swivel and IBM’s Many Eyes in the March 1st, 2007 edition. From the article: “I’m often frustrated by my inability to analyse in a different way data that are printed in peer-reviewed publications, when I’m interested in looking at a relationship that the authors didn’t think of,” [Brent Edwards, director of the Starkey Hearing Research Center in Berkeley, California] says. If research organizations and journals linked the raw data behind papers to social software tools such as Swivel and Many Eyes, he argues, “it would have considerable value to the scientific community as a whole”. Read the full article

Swivel in Fast Company

Michael Prospero at Fast Company wrote an article about Swivel in the March 2007 issue. He and the graphics team there did a cool job telling the story of how we got Swivel off the ground. Here’s an excerpt: “Swivel, a new startup, lets users upload, compare, and contrast data—from iPod sales to wine consumption—to make sense of the world.” A Web 2.0 story in charts. ” You can read it on page 26 of the print magazine or if you are a subscriber you can read it online.

Swivel Mentioned in Wired

Wired mentioned Swivel in their Playlist for February: “Imagine our delight at a Web site that not only lets you play with other people’s data but also helps you make your own charts! (Yes, we’re nerds: and that surprises you why?) Upload Excel files or enter your own figures. From there, create a mashup of your data with someone else’s, pick a pretty chart style, and kiss Excel ugliness good-bye.” Read the full playlist

The TechCrunch Blog Post That Started It All

Michael Arrington single-handedly launched Swivel Preview into the blogosphere when he wrote: “Swivel Co-founders Dmitry Dimov and Brian Mulloy start off by describing their company as “YouTube for Data.” That’s a good start for someone trying to understand it, because the site allows users to upload data – any data – and display it to other users visually. The number of page views your website generates. Or a stock price over time. Weather data. Commodity prices. The number of Bald Eagles in Washington state. Whatever. Uploaded data can be rated, commented and bookmared by other users, helping to sort the interesting (and accurate) wheat from the chaff. And graphs of data can be embedded into websites. So it is in fact a bit like a YouTube for Data.” Read the post

Roadpost – Rent an Iridium Satellite Phone

Roadpost – Rent an Iridium Satellite Phone

Once the only the privilege of the military elite, Satellite phones have hit the mainstream, with rental plans for as little as $7 per day – a small insurance fee for a potentially life-saving technology.

During July I kayaked three days in the Garden Islands in Northern Lake Michigan, launching from the Upper Peninsula. The wind blew continuously over a long fetch (up from the length of Green Bay to the Northeast), and waves grew to six feet.

The area is quite remote and our party of three didn’t see another vessel when we were out on the water.

Sea kayaking mishaps tend to have a cascading effect with second and third failures compounding the initial capsize. For example, a second kayak capsizes rescuing the first, flares fail to work, the victim is hypothermic, the cell phone gets wet etc.).

In a remote area our handheld VHF has a range of 1 to 3 miles – which would have been useless to us if we had capsized many miles from the nearest port.

It is my personal wilderness policy to make a review of any safety issues and take action to prevent them in the future. I was not comfortable that we had an adequate emergency communication should we need it. Fortunately we didn’t have reason, as we avoided an emergency… but next time I want to be sure, and began researching emergency communications options.

Two long-range emergency communications are available, a GPS beacon or EPIRB (Emergency Personal Infrared Radio Beacon), and a Satellite Phone.

Satellite phone rentals now make this otherwise prohibitively expensive option easily in the reach of the sea kayak or other wilderness adventurer. Simply rent the phone for delivery in time for your scheduled departure, and bring the waterproof box.

Roadpost seems to be a well-organized service, but others are available. The principal brands manufactured are Iridium and Qualcomm products.

Visual Composer Service Content

SAP NetWeaver Visual Composer is a powerful visual design tool that enables content developers and business design experts to create model-driven SAP NetWeaver content without having to write code manually. By facilitating content development, SAP NetWeaver Visual Composer lowers the total cost of ownership (TCO) and increases the return on investment (ROI).

SAP NetWeaver Visual Composer is available for download on the SAP Service Marketplace via quick link /patches (SAP Support Packages and Patches > Entry by Application Group -> SAP NetWeaver Components -> SAP NetWeaver -> SAP NetWeaver 04 -> Visual Composer > Visual Composer 6.0 > NT/I386 > MS SQL Server).

The Installation Guide for SAP NetWeaver Visual Composer 6.0 is available on the SAP Service Marketplace via quick link /instguides (Installation and Upgrade Guides > SAP NetWeaver -> Release 04 -> Installation -> SAP VC Composer).

The User Guide for SAP NetWeaver Visual Composer 6.0 is available on the SAP Help Portal (Documentation > SAP NetWeaver > Prior to SAP NetWeaver ’04 > SAP Visual Composer (6.0) ).

For the latest information on SAP NetWeaver Visual Composer, please refer to Central Note 716752 (Release Note)

Analytic xApp Learning Resources

This SDN resource provides a listing of eLearning resources for analytical xApps. xApps is SAP’s term for “cross applications”, or applications which can be deployed within Netweaver, but from any external source that is coded in a compliant format.

There are a large number of xApps available for download at http://service.sap.com/swdc in the Application Area section under xApps for analytics.