Newly acquired nearline storage compression technology for SAP BI.
Newly acquired nearline storage compression technology for SAP BI.
Newly acquired nearline storage compression technology for SAP BI.
The enterprise does not only want the ability to look back; it wants the ability to look forward.
Forecasting the future of business intelligence InfoWorld Weblog December 11, 2007 By Ephraim Schwartz
This is an interesting development that helps SAP’s BI solutions look much more robust.
One of my Steelcase analytics colleagues referred to BW as “the data jailhouse”. He happened to be a PhD. in Stats, and saw the BW layer as nothing but a nuisance lying between the source data and his real analytics which happened to be SAS.
Now that the bigger fish, SAP, has swallowed a fish, Business Objects, which in turn has swallowed SPSS. Seen together, we have the ability to provide a robust analytical solution out within SAP for the first time ever.
To be explored in the future will be migration paths from current BI 3x and BI7 to a full BI / BO / SPSS platform.
Peter Fader, professor of marketing at University of Pennsylvania’s Wharton School, is the ultimate marketing quant—a world-class, award-winning expert on using behavioral data in sales forecasting and customer relationship management. He’s perhaps best known for his July 2000 (PDF) expert witness testimony before the U.S. District Court in San Francisco that Napster actually boosted music sales. (Napster was then the subject of an injunction for copyright infringement and other allegations brought against it by several major music companies.)
The energetic and engaging marketing professor has a pet peeve: He hates to see companies waste time and money collecting terabytes of customer data in attempts to make conclusions and predictions that simply can’t be made. Fader has come up with an alternative, which he is researching and teaching: Complement data mining with probability models, which, he says, can be surprisingly simple to create. The following is an edited version of his conversation with CIO Insight Executive Editor Allan Alter.
CIO INSIGHT: What are the strengths and weaknesses of data mining and business intelligence tools?
FADER: Data mining tools are very good for classification purposes, for trying to understand why one group of people is different from another. What makes some people good credit risks or bad credit risks? What makes people Republicans or Democrats? To do that kind of task, I can’t think of anything better than data mining techniques, and I think it justifies some of the money that’s spent on it. Another question that’s really important isn’t which bucket people fall into, but when will things occur? How long will it be until this prospect becomes a customer? How long until this customer makes the next purchase? So many of the questions we ask have a longitudinal nature, and I think in that area data mining is quite weak. Data mining is good at saying, will it happen or not, but it’s not particularly good at saying when things will happen.
Data mining can be good for certain time-sensitive things, like is this retailer the kind that would probably order a particular product during the Christmas season. But when you want to make specific forecasts about what particular customers are likely to do in the future, not just which brand they’re likely to buy next, you need different sets of tools. There’s a tremendous amount of intractable randomness to people’s behavior that can’t be captured simply by collecting 600 different explanatory variables about the customer, which is what data mining is all about.
People keep thinking that if we collect more data, if we just understand more about customers, we can resolve all the uncertainty. It will never, ever work that way. The reasons people, say, drop one cell phone provider and switch to another are pretty much random. It happens for reasons that can’t be captured in a data warehouse. It could be an argument with a spouse, it could be that a kid hurt his ankle in a ballgame so he needs to do something, it could be that he saw something on TV. Rather than trying to expand data warehouses, in some sense my view is to wave the white flag and say let’s not even bother trying.
Do you think people understand the limitations of data mining?
They don’t. And this has nothing to do with data mining or marketing, but it has a lot to do with human nature. We’re seeing the same issues arising in every area of science. As data collection technology and model-building capabilities get better, people keep thinking they can answer the previously unknowable questions. But whether it’s the causes of diseases or mechanical failure, there’s only so much we can pin down by capturing data.
Do people who use data mining packages understand enough about how to use them?
I can’t make generalizations that are too broad, but there are some people who are hammers looking for nails. They think they can answer any problem using one set of procedures, and that’s a big mistake. When you go into other domains, you need to pull out different tools. One of the things that just makes me crazy is when people misuse the kinds of statistics that are associated with data mining. A lift curve will show us how well our predicted rank order of customer propensities corresponded to their actual behavior. That’s a fine thing to do in a classification setting, but it’s not particularly diagnostic in a longitudinal setting. We want ‘when’-type diagnostics to answer ‘when’-type questions. People just aren’t looking in the right places to see whether their model’s working.
Exactly what do you mean by a propensity as opposed to a behavior?
The difference is that just because people have a tendency to do things doesn’t mean that they will. You might be someone who buys from Amazon once a month on average. Does that mean over the next 10 years, over the next 120 months, you’ll buy 120 items? No. You could go two years without buying, or you might buy five items in a given month. The amount of variability around your propensity is huge. That’s where all this randomness comes in.
Have companies hurt themselves by misusing data mining tools?
Let me start with a positive example. I have tremendous admiration for what actuaries do, and therefore for the way insurance companies deal with their customers. Actuaries will not look at all your characteristics and say when you will die. They’ll simply come up with a probabilistic statement about the likelihood that someone with your characteristics will die, or what percent of people who share characteristics will live to be 70. They understand that it’s pretty much impossible to make statements about each and every policyholder.
Now, carry that over to the marketing world. Lots of firms talk about one-to-one marketing. I think that’s a real disservice to most industries. One-to-one marketing only works when you have a very deep relationship with every customer. So one-to-one marketing works great in private wealth management, or in a business-to-business setting where you meet with the client at least once a month, and understand not just their business needs but what’s going on in their life. But in areas approaching a mass market, where you can’t truly distinguish each individual, you just have a bunch of people and a bunch of characteristics that describe them. Then the notion of one-to-one marketing is terrible. It will do more harm than good, because the customers will act more randomly than you expect, and the cost of trying to figure out what specific customers will do far outweighs the benefits you could get from that level of detail.
It’s very hard to say who’s going to buy this thing and when. To take that uncertainty and square it by looking across two products, or to raise it to the nth power by looking across a large portfolio of products, and say "these two go together," and make deterministic statements as opposed to talking about tendencies and probabilities, can be very, very harmful. It’s much more important for companies to come up with appropriate groupings of similar people, and make statements about them as a group.
I don’t want to pick on Amazon in particular; they really tout the capabilities of their recommendations systems. But maybe this customer was going to buy book B anyway, and therefore all the recommendations were irrelevant. Or maybe they were going to buy book C, which would have been a higher-margin item, so getting them to buy book B was a mistake. Or maybe they’re becoming so upset by irrelevant recommendations that they’re going away entirely. I don’t want in any way to suggest that cross-selling shouldn’t be done, but what I’m suggesting is that the net gains from it are less than people might think. It often can’t justify the kinds of investments that firms are making in it.
You’ve been championing the use of probability models as an alternative to data mining tools. What do you mean by a probability model?
Probability models are a class of models that people used back in the old days when data weren’t abundantly available. These modeling procedures are based on a few premises: People do things in a random manner; the randomness can be characterized by simple probability distributions; and the propensities for people to do things vary-over time, across people, across circumstances. Probably the best known example is survival analysis, which stems largely from the actuary sciences. It’s also used in manufacturing. You put a bunch of lightbulbs on a testing board and see how long they last. In many ways, that’s what I suggest we do with customers. We’re not going to make statements about any one lightbulb, just like we shouldn’t make statements about any one customer. We’ll make collective statements about how many of these bulbs will last for 1,000 hours. It turns out that the analogy of survival analysis in manufacturing and actuarial and life sciences carries over amazingly well to customers. A lot of managers would bristle at the idea, but I think that metaphor is far better than all this excessive customization and personalization that’s been going on. Customers are different from each other just as lightbulbs are, but for reasons that we can’t detect, and reasons that we’ll have a very hard time taking advantage of.
What kind of problems can probability models solve?
Probability models have three basic building blocks: One is timing-how long until something happens. One is counting-how many arrivals, how many purchases or whatever will we see over a given period of time. And choice-given an opportunity to do something, how many people will choose to do it. That’s it. Most real-world business problems are just some combination of those building blocks jammed together. For instance, if you’re modeling the total time someone spends at a Web site during a given month, you might model it as counting-timing: a count model for the number of visits and a timing model for the duration of each one. My view is that we can very easily build simple models in Excel for each of those three things. A lot of people have built this kind of model over the years, and have tested them very carefully, in some cases putting them directly up against data mining procedures. They have found that their capabilities are not only astonishing, but far better than data mining. If you think about all the different ways you can combine timing, counting and choice, you can tell all kinds of interesting stories about different business situations.
How would you use these models to identify the most profitable customers or calculate customer lifetime value?
This is where probability models can come together beautifully with data mining. We can use these models to come up with very accurate forecasts about how long this customer will stay with us or how many purchases they’ll make over the next year. So use the basic probability model to capture the basic behavior and then bring in data mining to understand why groups of customers with different behavioral tendencies are different from each other. You see, behavior itself is not perfectly indicative of the true underlying propensities, which is what managers really want to know. And so we build a probability model that helps us uncover the propensities, and then we can take those propensities-the customer’s tendency to do something quickly or slowly or to stay online a long time or not-and throw those into the data mining engine to explain those as a function of the 600 variables. You’ll find a much more satisfying and fruitful explanation in terms of being able to profile new customers and understand the likely actions of current ones. When it comes to taking the outputs of the probability model and understanding them, data mining procedures are the best way to go.
Can probability models capture longitudinal or predictive information?
Very, very well. In fact, one of my favorite examples is looking at customer retention and return. You can do it simply without any explanatory variables at all. The irony is that if you bring in explanatory variables, in many cases the model will do worse. This makes managers crazy. They need to know why these people are different. But if you’re bringing in explanatory variables that aren’t really capturing the true underlying reasons for the differences, then you’re just adding noise to the system. Your ability to come up with an accurate forecast for each group might actually be worse.
So you use data mining to help you figure out why those propensities exist.
That’s right. The key is to explain the propensities-the tendency to do things-as opposed to the behavior itself.
You said these models can be built in a spreadsheet. It doesn’t sound like you have to be a high-powered Ph.D. to create them.
Of course, that never hurts. But yes, these models are far more transparent to managers because the stories they tell are simpler, the demands on the data are far simpler, and the implementation is much easier. So what I like to do is to start people out with some of the really simple models and get people hooked. Show me how many customers we’ve had in year one, two, three, four, five, and I’ll tell you how many we’ll have in year nine and ten before we even bring in all the explanatory variables that data miners want to do.
If companies move to using models more, what data can they stop collecting and what data will they still need to collect?
Ultimately, what matters most is behavior. That shouldn’t be a controversial statement, but a tremendous amount of the data that’s being collected is nonbehavioral. Data on demographics, psychographics, socioeconomics and even consumer attitudes can not only waste servers and storage space but can actually make the models perform worse. I have lots of examples of data that leads to tremendously misleading inferences about what really matters.
So behavior’s what matters most, and even then you can often summarize behavior in very simple ways. For instance, in many cases we find that you don’t even need to know exactly when each transaction occurred to make forecasts. Simply give me summary statistics, such as frequency. Just tell me when was the last time they made a purchase and how many purchases they made over the last year, and that will explain pretty much everything worth explaining. You mentioned that a CIO Insight survey found that the amount of customer data companies are collecting is increasing at an annual rate of about 50 percent. I would claim that most of that 50 percent is completely wasted. It’s one thing to have 50 percent more data, but you’re certainly not getting 50 percent more knowledge or insight. In fact, you could be doing more harm than good, because you’re crowding out the few variables that really do matter.
What companies have done a good job of using models this way?
I wish I could put some companies on a pedestal, but I’ve never seen a firm really embrace this stuff as fully as I’d like. And I’ll tell you why: It’s really my fault. It’s the fault of academics who spend almost no time teaching these procedures. Most firms just aren’t getting exposed to this stuff.
What should CIOs do to help their companies use analytical and modeling tools appropriately?
For one thing, remember that more is not necessarily better. CIOs often push back on analytics because of cost, but if someone could give them all this additional data for free, they’d take it. That’s often wrong. Additional data can actually harm you because you’re going to start capturing random, quirky, idiosyncratic things that aren’t related to the true underlying propensities. The flipside is that a few simple measures that have been around forever, like recency and frequency, are all you need. If you can use data collection technology to get those measures more accurately or on a timelier basis, then maybe it’s worth the investment. Second, remember that some surprisingly simple models can take you incredibly far if you’re willing to not worry so much about drivers. Don’t bother looking for the drivers; first, capture the behavior. So start simple; that often means start in Excel. You’d be amazed at how much you can accomplish without even having to leave the spreadsheet environment.
I’m familiar with the wonderfully effective http://planetfeedback.com site, which provides a single “service complaint portal” (and to be fair also for positive feedback).
But on the business side of the equation, Intelliseek, which owns PlanetFeedback, has powerful web-based business intelligence tools and services that help companies use high-tech datamining tools to glean customer attitudinal data from the web and other sources.
You can learn more by going to http://intelliseek.com, or read this brief summary.
According to Intellliseek’s published corporate background,
“We are among the first companies to pioneer the concept of 360-degree marketing intelligence, challenging companies and brands to look holistically across all their consumer or customer touch points. We’ve married this business concept with world-class discovery and mining technology, and we deliver high-impact marketing automation and marketing transformation solutions to progressive marketers, researchers and analysts.
Technology leadership is another distinguishing characteristic that sets us apart. Since 1997 and through a series of technology and corporate acquisitions, we have focused on developing a type of technology that answers the fundamental question: ‘how can we improve professionals’ research and analysis capabilities, how can we leverage the exponentially growing amount of unstructured content that exists within and outside any organization, company or enterprise?’
We are leaders in federated content discovery technologies, providing access to more sources, more data types and more languages than other vendors. We provide rich administration tools to add/remove new content sources. We specialize in industry-standard, open architecture and APIs for seamless integration into any large enterprise network, and we create user-friendly applications.
We are leading with content mining technologies featuring classifiers/categorizers that are suitable to different types of content and state-of the art machine-learning techniques that extract entities, relationships, sentiments, facts and events from unstructured data.
We have unique methodologies and step-by-step processes in which disparate, unstructured content is converted into actionable business intelligence — charts, graphs and alerts.”
Here is information on a specific solution offering – BrandPulse.
What is BrandPulse Internetâ„¢?
BrandPulse Internetâ„¢ is an Internet Monitoring application that helps marketers, market researchers and product developers measure and track the pulse of consumer “buzz” about any brand, company, or emerging issue. The BrandPulse Internet solution collects and analyzes content from public online databases and discussion boards, and it reports actionable insights via a convenient digital dashboard. BrandPulse represents Real-time Marketing Intelligence at its best.
How Can BrandPulse Internetâ„¢ Help You?
Online consumer discussion data is reported in an easy-to-use, sophisticated desktop analytical tool. The BrandPulse Internet solution can search millions of data points to tell your company:
Whatâ€™s the buzz about my brand/show? Is it growing? Shrinking?
Is my advertising message penetrating the clutter of other branding noise?
Where are people hanging out on the Internet to discuss my brand, my competitors?
What new products/product improvements should be made?
What are the new phrases, ideas and concepts being discussed on the Internet about my brand?
How does the Internet influence consumer purchase decisions and brand loyalty?
Twenty percent of consumers who purchased a 2001 or 2002 vehicle sought the advice of other consumers online before they bought. Based on 2002 automotive sales figures, thatâ€™s about 3 million consumers! With the rapid growth of Internet message boards and discussion rooms about a wide range of products and services, companies can no longer afford to look at online consumer discussion data anecdotally. Thanks to Intelliseekâ€™s technology, this data is carefully sorted and analyzed, supplementing traditional research data and providing a 360-view of consumer understanding. Our clients say it is some of the best research available for Marketing Intelligence.
How Can BrandPulse Internetâ„¢ Be Used?
Buzz Measurement and Analysis
Marketing Message Optimization/Advertisement Effectiveness
New Initiative/Product Assessment
Product Research and Trend Analysis
Customer Satisfaction & Loyalty Management
BrandPulse Internet is Unique
Other vendors offer clipping services, surveys or some Internet monitoring, but their clients are often left with mountains of uninterpreted data or minimal support for analyzing the collected information. BrandPulse goes a step further, thanks to the search technology that powers it. Its proprietary methodology includes four phases: discovery, mining, analysis, and reporting.
Content Discovery: The BrandPulse solution scours the Internet to find relevant discussions about brands, issues, trends and hot topics, and BrandPulse’s broad coverage finds and collects conversations happening on the Internet across three main areas: Large Portals (USENET, general interest sites, etc.), Industry Verticals (specialized sites, such as automotive or health web sites) and MicroSites and smaller communities, including web logs. Intelliseek can add new discussions as they emerge or can go back in time across information that’s been indexed over the past three years.
Content Mining: Intelliseek’s content mining capabilities are rooted in machine-learning and natural language processing technologies that mine unstructured data — vast amounts of raw text — to discover the intelligence it contains. These technologies are able to identify key phrases and words, detect the nature and strength of sentiment in text, classify and categorize data to provide meaning and relevance, and extract specific facts and data points to create the meaning and context that lead to intelligence.
Analysis: Data is sorted by source, volume, and other metrics that help provide insights on concepts, sentiments and trends in consumer opinions. The analysis tools provide totally new ways of considering and measuring marketing data and factors, including competitive benchmarks, overall consumer “buzz” about certain products or brands, and the likelihood of active, influential consumers spreading their opinion and influence to others (virality).
Reporting: Once information is discovered, the BrandPulse solution is able to break down information about brands, companies or issues across any number of metrics, all in near real-time. We are able to create thousands of reports in a customized manner for any client.
The result? Marketers, market researchers and business development professionals get a holistic view of customer insights in order to make smarter, more informed, strategic marketing decisions. Plans can be executed efficiently and intelligently.
For $2,300, full access to new development tools. Just like having your own discovery server, but cheaper.
Subscription provides a one-year development, evaluation, and test license to the following software, education and service components.
SAP NetWeaver Developer Studio
Build J2EE-based multi-tiered business applications using the Eclipse-based integrated development environment.
Develop and modify SAP R/3-based client-server applications using an integrated development environment designed specifically for the ABAP language.
Create customized Java- or ABAP-based user interfaces for your applications using the drag-and-drop features of this tool.
SAP NetWeaver Visual Composer
Develop pattern-based or freestyle user interfaces, and define the flow of data between them using this tool which utilizes a simple drag-and-drop style of model-driven development.
SAP NetWeaver Platform
The SAP NetWeaver platform which includes those applications such as Business Intelligence (BI), Process Integration (PI), the Java Application Server, Master Data Management (MDM), Mobile and Knowledge Management (KM) that enable:
Patches and Updates
Download the latest patches and updates from the SAP Service Marketplace and access the knowledge base for the latest information.
Testing Access the Enterprise Services Workplace
Test your applications against SAP’s repository of enterprise services.
Premium Access in Forums
Get faster responses to your technical questions from the community by having your posts highlighted in the forums so they have more visibility. Your premium status enables you to award up to double the regular points for the best responses.
Virtual SAP TechEd
Access online recorded sessions from worldwide SAP TechEd events. Included are presentation slides, synchronized audio, streaming video, and downloadable PDF files.
SAP NetWeaver platform product box.
Receive a product box with the set of DVDs and installation documentation for the SAP NetWeaver platform so you can easily get started.
One of the highlights of the study – the 15.7 percent growth in the OLAP market was the highest rate in the past five years.
For those of us who are SAP-centric, it’s important to keep in mind that large companies on average have implemented 3.4 data warehouse packages on average (source: CIO), and that SAP BW has a relatively small share of the overall data warehouse market. As reported by Pendse, …"IBM, Oracle and SAP, which dominate in other markets, are all relatively weak in the OLAP market…. (but) many SAP sites feel obliged to install and use Business Information Warehouse (BW), though there are very few successful deployments and large volumes of shelfware. But, despite this, we estimate that SAP BW deployments grew significantly in 2004."
The OLAP Report: Market Share Analysis
by Nigel Pendse
Summary: The study reveals that the online analytical processing (OLAP) market has recorded its best growth since in 2000, to become a market worth $4.3 billion.
The OLAP Report recently published details of the 2004 OLAP Market Shares Report (www.olapreport.com). The study reveals that the online analytical processing (OLAP) market has recorded its best growth since the downturn in 2000, to become a market worth $4.3 billion. Despite the maturity of the market, the OLAP market continued to expand faster than most other enterprise software sectors.
Nigel Pendse, leading OLAP and business intelligence analyst and author of the study comments, "Despite the consolidation in 2003, the market still remains relatively fragmented, with three of the top five vendors losing net market share in 2004, and the other two making modest gains. As the market matures, growth normally slows due to a degree of market saturation, but the 15.7 percent growth rate of 2004 demonstrates that sales of OLAP products are accelerating."
Pendse predicts that 2005 will also show strong growth: "Microsoft will continue to make further gains in 2005, boosted primarily by partner product enhancements, as the Yukon release of SQL Server won’t be released until the end of the year which will have little impact on its 2005 market share. MicroStrategy will continue to do well, propelled by its newly released version 8 software and SAP’s OLAP share will increase based on the bundling of Business Information Warehouse as part of mySAP. Further market consolidation is inevitable and vendors will need to reassess their market strategy in light of the Microsoft threat.
2004 was the best year since 2000 for the OLAP industry, and all the vendors grew, though of course to varying extents. This caused changes in the market shares, but as there was no major consolidation in the year, the changes were less dramatic than in some previous years.
The long-forecast consolidation in the BI industry took off in 2003, with Business Objects, Cognos and Hyperion all widening their product ranges with significant acquisitions. However, neither Brio Software nor Crystal Decisions had significant OLAP business, so these acquisitions, though significant for the BI market as a whole, had relatively small impacts on OLAP market shares. Despite this consolidation, the market still remains relatively fragmented compared to, say the ERP or database markets, with three of the top five vendors losing net market share in 2004, and the other two making modest gains.
The OLAP market continued to grow faster than most other enterprise software sectors, though still at a lower rate than in the 1990s. There were signs of growing demand towards the end of 2003, plus the weak dollar magnified the organic growth through increasing the impact of sales made in stronger currencies, particularly in Europe. These trends continued into 2004, which showed the best growth rate since 2000. The early signs are that the market has remained buoyant in 2005.
However, despite the growth in OLAP sales and usage, it is becoming ever more difficult to estimate the exact size of the whole market and the individual market shares. The larger generalist vendors – Microsoft, Oracle, SAP, Business Objects – cannot even measure their OLAP business themselves, because their OLAP capabilities are often delivered as part of larger, bundled products and account for a minority of their revenues. For example, Microsoft Analysis Services and the OLAP capabilities of SAP BW are not sold separately, but are included as part of product suites. Other vendors, such as Hyperion, do track but no longer publish details of their individual product sales, which makes it harder to isolate their OLAP business. It is therefore necessary to estimate the extent to which customers deploy the OLAP products and this is necessarily approximate. Our aim is to provide a good indication of trends, but not to claim precision to the last decimal place.
Although the OLAP market is consolidating, it is still much more open than most enterprise software markets. Microsoft has now clearly overtaken Hyperion Solutions to become the largest OLAP vendor, but neither could be called "dominant." Although Microsoft’s lead is likely to increase further in 2005, it will still not have anything like the dominance it enjoys in some other markets. In fact, it may only have the same OLAP revenue market share in 2005 as Hyperion Solutions enjoyed immediately after its ill-fated merger in 1998. Microsoft has no major OLAP product releases expected until late 2005, so it is selling a five-year-old product in 2005 and is lucky not to be losing ground to more recently updated competitors. Microsoft still has no strong OLAP client tools, unlike the other OLAP server vendors, though the progress made by third-party client tool and application partners helps Microsoft’s market share.
Similarly, IBM, Oracle and SAP, which dominate in other markets, are all relatively weak in the OLAP market, and Oracle in particular has had several years of decline. Oracle may start to stage a recovery in 2005 as it completes its new generation product line, but it has now lost too much ground to ever recover the leading position it held in the mid 1990s. Many SAP sites feel obliged to install and use Business Information Warehouse (BW), though there are very few successful deployments and large volumes of shelfware. But, despite this, we estimate that SAP
BW deployments grew significantly in 2004.
Other Points of Interest
The 15.7 percent growth in the OLAP market was by far the best since 2000.
Some of the reported "growth" was caused by the weak dollar.
Microsoft continued to increase its lead.
MicroStrategy was the fastest growing vendor.
There was no significant further consolidation in 2004.