Monthly Archives: October 2007

ECC 6.0 Release Notes

This is a good resource…

ECC 6.0 Release Notes

Release Note link for ECC 6.0 – Updated 032708

Google Acquires Chicago’s Feedburner for $100 million

In June 2007 Google purchased the Chicago company FeedBurner for a reported $100 million.  Since then, Feedburner has been integrated nicely into Google’s rapidly expanding suite of publishing and web advertising vehicles.  The variety of tools here are impressive, easy to use, and free of charge. Here’s my FeedBurner link.

The World Energy Modeling Project

From an article written by Dick Lawrence, co-founder of ASPO-USA

Monday, 06 August 2007

Energy is at the foundation of every aspect of our present globalized economy. Without adequate energy, our still-growing world population, increasingly urbanized and industrialized, faces the prospect of reduced standards of living, declining access to food and clean water supplies, and contraction of global trade and GDP.

In the next decade and beyond, policy decisions will be made at national and global levels that have consequences to large segments of the Earth’s human population and to the world environment. These decisions will directly and indirectly impact energy and resource availability, human well-being, and the sustainability of the environment on which all economies ultimately depend.

Understanding the complex relationships between energy, the economy, human living standards, and national policy decisions is a difficult task. Well-informed observers often arrive at opposite conclusions, even when in possession of the same facts. How can we cut through the morass of conflicting opinions and develop a better understanding of the consequences of policy decisions?

Increasingly, researchers turn to computer-based dynamic-systems modeling techniques when they are trying to understand complicated systems. Thirty-five years ago, colleagues of Jay Forrester at MIT published the results of a study called Limits to Growth, which attempted to look at the global human population and its relationships to resources, food supply, pollution, and more.

In the 1980s, Robert Kaufmann co-authored, with 3 others, a study of energy flow through the U.S. economy in Beyond Oil (last updated in 1992). That study was the inspiration for our proposal to model energy flow at the global level, first shown to ASPO members and attendees at the 2004 Berlin conference.

This year, ASPO-USA developed a Request for Proposals and distributed it to organizations and academic groups with the resources and skill sets to implement such a model. After reviewing the proposals, we decided to merge the capabilities of two responders into a combined project team. ASPO-USA brought the two groups together in mid-May of 2007 and officially launched the project.

The two teams are:

  • Millennium Institute – main model development, building on the foundation of their T21-USA model, which has substantial energy components.

  • State University of New York – Environmental Science and Forestry (SUNY-ESF) – creation of the “energy core” of the model, including EROI database and feedback paths. ESF will also develop new graphical user interfaces.

The teams will develop the North America model (U.S., Mexico, Canada) over the summer of 2007, performing initial model runs in September. They will then expand the scope of the model to the global level, completing development by (approximately) mid-2008.

We want the model to be capable of answering the following questions:

  • Given the finite and future limited availability of fossil fuels, with growing supply-demand mismatch, what is the best use to which we can put remaining supplies of “cheap” oil and gas?

  • How much of our present and near-term fossil-fuel supply should be diverted to developing sustainable / renewable energy resources in a way that minimizes negative impacts on food production, water supply, per-capita energy availability, and quality of life for residents in developed, developing and under-developed nations?

  • What would be the consequences of delaying accelerated or “crash” programs by one or two decades? (see “the Hirsch Report”)

  • What are the net-energy consequences for a variety of likely mixes of energy sources (i.e. a specified mix of conventional fossil fuels, biofuels, nuclear, and renewable, for example)?

  • How much can biofuels (ethanol, biodiesel) contribute to energy supply without negatively impacting food supply or prices?

  • To what extent do limits on water availability restrict energy development?

  • What is the CO2 emissions impact for likely future energy scenarios? (CO2 emissions will be tracked for all scenario runs).

  • What is the energy cost of CO2 sequestration? Is it feasible on a large scale?

  • Is a “hydrogen economy” feasible? What are the net-energy and environmental implications of different approaches to hydrogen production? How does the “hydrogen economy” compare with an all-electric transportation scenario?

  • Can we substitute energy products from tar sands, shale oil and coal (CTL) for conventional liquid fuels? If so, how long would these resources actually last at different growth rates?

  • As wealth flows into energy-exporting nations from energy importers, standards of living and demand for products and energy rises in the exporting countries. What are the consequences for availability of energy supply, and energy costs, for importing nations?

These are, of course, preliminary questions. Over time, new questions will be put to the model. A comprehensive and well-tested model will be able to answer new questions as they arise with only minimal modifications, if any.

The model incorporates complex relationships between energy, the economy, agriculture, industry, transportation, and the environment, including tracking CO2 emissions for all scenarios. Like the groundbreaking Limits to Growth more than three decades earlier, its results are not predictions, but provide insight into the consequences of economic and energy policy decisions. The model provides guidance that permits investigators to better understand the impacts of regulation, financial investment and incentives, and energy policy, and to analyze the consequences of developing various future mixes of energy source.

Varying estimates of fossil fuel supply may constitute different scenarios – for example, using ASPO’s estimate of recoverable oil and gas, vs. those of USGS/EIA, are two scenarios we can run to explore the consequences of those supply estimates. During a scenario run, decisions are made which influence the outcome. The results will be collected and analyzed to understand which decisions yield preferred outcomes. We will disseminate the results of model runs to a broad audience of academics, energy researchers, the public, and (most importantly) to policy-makers at all levels of government.

Recent studies, like “The Hirsch Report” commissioned by U.S. DOE (2005), warn of potentially serious consequences if we fail to respond in time to the threat of depletion of fossil fuel supplies. A model of world energy flow will permit a more detailed investigation of these scenarios and what energy policy decisions, and timing of implementation, will best reduce the impact of depletion.

Climate change is obviously a critical topic now getting enormous media and political attention. While it will not attempt to model the complex relationships between anthropogenic CO2 emissions, climate, and the human economy, the model will monitor CO2 emissions for all scenarios. The consequences of those emissions – temperature changes, regional and global weather changes, agricultural impacts – may be factored into some scenarios.

The model will account for and track flows of energy and materials based on physical laws (i.e. energy and matter cannot be created from nothing). It will access a database of EROI (energy return on energy invested) for all forms of energy – conventional, renewable, and unconventional. The model will show what is possible, given known constraints on energy availability, material resources, and financial capital.

We will develop the world energy model as an “open source” project – anyone with Internet access will be able to run the model and view the results of scenario runs.

One goal of the project is to develop a simple game-like user interface that makes the model accessible to those without experience in modeling complex systems. Others with more expertise will be able to go into the model, understand how it works, and develop their own scenarios. Model users from around the world will be able to communicate with each other using a web site dedicated to model discussion, modification, and operation.

Dick Lawrence is a co-founder of ASPO-USA.


The Limits to Growth – Donnela Meadows, Dennis Meadows et al – Universe Books 1972

Beyond the Limits – Donnela Meadows, Dennis Meadows, Jørgen Randers – Chelsea Green Publishing Co. 1992 – ISBN 0-930031-55-5

Beyond Oil – Gever, Kaufmann, Skole, Vorosmarty – Carrying Capacity, Inc. 1986
ISBN 0-88730-075-8(PB)

Peaking of World Oil Production: Impacts, Mitigation, & Risk Management – Robert Hirsch, Roger Bezdek, Robert Wendling – February 2005; available online at:

YouTube – In a Moment

YouTube – In a Moment

My daughter Avery wrote, filmed, and directed this short Independent Short Film in her junior year of high school.

Last FM – Dashboard Confessional

This is the player widget from the nifty site, in this case tuned to the Dashboard Confessional tag. I have to try to get this into the sidebar at some point.

One of the cool things I’ve used the site for is to find artists according to tags, such as "Langue Francaises".
table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e td {margin:0 !important;padding:0 !important;border:0 !important;}table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e tr.lfmhead a:hover {background:url( no-repeat 0 0 !important;}table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e tr.lfmembed object {float:left;}table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e tr.lfmfoot td.lfmconfig a:hover {background:url( no-repeat 0px 0 !important;;}table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e tr.lfmfoot td.lfmview a:hover {background:url( no-repeat -85px 0 !important;}table.lfmwidgeta3e492a5b9e2eb1c4b2317df5163222e tr.lfmfoot td.lfmpopup a:hover {background:url( no-repeat -159px 0 !important;}

What Data Mining Can and Can’t Do

What Data Mining Can and Can’t Do

Peter Fader, professor of marketing at University of Pennsylvania’s Wharton School, is the ultimate marketing quant—a world-class, award-winning expert on using behavioral data in sales forecasting and customer relationship management. He’s perhaps best known for his July 2000 (PDF) expert witness testimony before the U.S. District Court in San Francisco that Napster actually boosted music sales. (Napster was then the subject of an injunction for copyright infringement and other allegations brought against it by several major music companies.)

The energetic and engaging marketing professor has a pet peeve: He hates to see companies waste time and money collecting terabytes of customer data in attempts to make conclusions and predictions that simply can’t be made. Fader has come up with an alternative, which he is researching and teaching: Complement data mining with probability models, which, he says, can be surprisingly simple to create. The following is an edited version of his conversation with CIO Insight Executive Editor Allan Alter.

CIO INSIGHT: What are the strengths and weaknesses of data mining and business intelligence tools?

FADER: Data mining tools are very good for classification purposes, for trying to understand why one group of people is different from another. What makes some people good credit risks or bad credit risks? What makes people Republicans or Democrats? To do that kind of task, I can’t think of anything better than data mining techniques, and I think it justifies some of the money that’s spent on it. Another question that’s really important isn’t which bucket people fall into, but when will things occur? How long will it be until this prospect becomes a customer? How long until this customer makes the next purchase? So many of the questions we ask have a longitudinal nature, and I think in that area data mining is quite weak. Data mining is good at saying, will it happen or not, but it’s not particularly good at saying when things will happen.

Data mining can be good for certain time-sensitive things, like is this retailer the kind that would probably order a particular product during the Christmas season. But when you want to make specific forecasts about what particular customers are likely to do in the future, not just which brand they’re likely to buy next, you need different sets of tools. There’s a tremendous amount of intractable randomness to people’s behavior that can’t be captured simply by collecting 600 different explanatory variables about the customer, which is what data mining is all about.

People keep thinking that if we collect more data, if we just understand more about customers, we can resolve all the uncertainty. It will never, ever work that way. The reasons people, say, drop one cell phone provider and switch to another are pretty much random. It happens for reasons that can’t be captured in a data warehouse. It could be an argument with a spouse, it could be that a kid hurt his ankle in a ballgame so he needs to do something, it could be that he saw something on TV. Rather than trying to expand data warehouses, in some sense my view is to wave the white flag and say let’s not even bother trying.

Do you think people understand the limitations of data mining?

They don’t. And this has nothing to do with data mining or marketing, but it has a lot to do with human nature. We’re seeing the same issues arising in every area of science. As data collection technology and model-building capabilities get better, people keep thinking they can answer the previously unknowable questions. But whether it’s the causes of diseases or mechanical failure, there’s only so much we can pin down by capturing data.

Do people who use data mining packages understand enough about how to use them?

I can’t make generalizations that are too broad, but there are some people who are hammers looking for nails. They think they can answer any problem using one set of procedures, and that’s a big mistake. When you go into other domains, you need to pull out different tools. One of the things that just makes me crazy is when people misuse the kinds of statistics that are associated with data mining. A lift curve will show us how well our predicted rank order of customer propensities corresponded to their actual behavior. That’s a fine thing to do in a classification setting, but it’s not particularly diagnostic in a longitudinal setting. We want ‘when’-type diagnostics to answer ‘when’-type questions. People just aren’t looking in the right places to see whether their model’s working.

Exactly what do you mean by a propensity as opposed to a behavior?

The difference is that just because people have a tendency to do things doesn’t mean that they will. You might be someone who buys from Amazon once a month on average. Does that mean over the next 10 years, over the next 120 months, you’ll buy 120 items? No. You could go two years without buying, or you might buy five items in a given month. The amount of variability around your propensity is huge. That’s where all this randomness comes in.

Have companies hurt themselves by misusing data mining tools?

Let me start with a positive example. I have tremendous admiration for what actuaries do, and therefore for the way insurance companies deal with their customers. Actuaries will not look at all your characteristics and say when you will die. They’ll simply come up with a probabilistic statement about the likelihood that someone with your characteristics will die, or what percent of people who share characteristics will live to be 70. They understand that it’s pretty much impossible to make statements about each and every policyholder.

Now, carry that over to the marketing world. Lots of firms talk about one-to-one marketing. I think that’s a real disservice to most industries. One-to-one marketing only works when you have a very deep relationship with every customer. So one-to-one marketing works great in private wealth management, or in a business-to-business setting where you meet with the client at least once a month, and understand not just their business needs but what’s going on in their life. But in areas approaching a mass market, where you can’t truly distinguish each individual, you just have a bunch of people and a bunch of characteristics that describe them. Then the notion of one-to-one marketing is terrible. It will do more harm than good, because the customers will act more randomly than you expect, and the cost of trying to figure out what specific customers will do far outweighs the benefits you could get from that level of detail.

It’s very hard to say who’s going to buy this thing and when. To take that uncertainty and square it by looking across two products, or to raise it to the nth power by looking across a large portfolio of products, and say "these two go together," and make deterministic statements as opposed to talking about tendencies and probabilities, can be very, very harmful. It’s much more important for companies to come up with appropriate groupings of similar people, and make statements about them as a group.

I don’t want to pick on Amazon in particular; they really tout the capabilities of their recommendations systems. But maybe this customer was going to buy book B anyway, and therefore all the recommendations were irrelevant. Or maybe they were going to buy book C, which would have been a higher-margin item, so getting them to buy book B was a mistake. Or maybe they’re becoming so upset by irrelevant recommendations that they’re going away entirely. I don’t want in any way to suggest that cross-selling shouldn’t be done, but what I’m suggesting is that the net gains from it are less than people might think. It often can’t justify the kinds of investments that firms are making in it.

You’ve been championing the use of probability models as an alternative to data mining tools. What do you mean by a probability model?

Probability models are a class of models that people used back in the old days when data weren’t abundantly available. These modeling procedures are based on a few premises: People do things in a random manner; the randomness can be characterized by simple probability distributions; and the propensities for people to do things vary-over time, across people, across circumstances. Probably the best known example is survival analysis, which stems largely from the actuary sciences. It’s also used in manufacturing. You put a bunch of lightbulbs on a testing board and see how long they last. In many ways, that’s what I suggest we do with customers. We’re not going to make statements about any one lightbulb, just like we shouldn’t make statements about any one customer. We’ll make collective statements about how many of these bulbs will last for 1,000 hours. It turns out that the analogy of survival analysis in manufacturing and actuarial and life sciences carries over amazingly well to customers. A lot of managers would bristle at the idea, but I think that metaphor is far better than all this excessive customization and personalization that’s been going on. Customers are different from each other just as lightbulbs are, but for reasons that we can’t detect, and reasons that we’ll have a very hard time taking advantage of.

What kind of problems can probability models solve?

Probability models have three basic building blocks: One is timing-how long until something happens. One is counting-how many arrivals, how many purchases or whatever will we see over a given period of time. And choice-given an opportunity to do something, how many people will choose to do it. That’s it. Most real-world business problems are just some combination of those building blocks jammed together. For instance, if you’re modeling the total time someone spends at a Web site during a given month, you might model it as counting-timing: a count model for the number of visits and a timing model for the duration of each one. My view is that we can very easily build simple models in Excel for each of those three things. A lot of people have built this kind of model over the years, and have tested them very carefully, in some cases putting them directly up against data mining procedures. They have found that their capabilities are not only astonishing, but far better than data mining. If you think about all the different ways you can combine timing, counting and choice, you can tell all kinds of interesting stories about different business situations.

How would you use these models to identify the most profitable customers or calculate customer lifetime value?

This is where probability models can come together beautifully with data mining. We can use these models to come up with very accurate forecasts about how long this customer will stay with us or how many purchases they’ll make over the next year. So use the basic probability model to capture the basic behavior and then bring in data mining to understand why groups of customers with different behavioral tendencies are different from each other. You see, behavior itself is not perfectly indicative of the true underlying propensities, which is what managers really want to know. And so we build a probability model that helps us uncover the propensities, and then we can take those propensities-the customer’s tendency to do something quickly or slowly or to stay online a long time or not-and throw those into the data mining engine to explain those as a function of the 600 variables. You’ll find a much more satisfying and fruitful explanation in terms of being able to profile new customers and understand the likely actions of current ones. When it comes to taking the outputs of the probability model and understanding them, data mining procedures are the best way to go.

Can probability models capture longitudinal or predictive information?

Very, very well. In fact, one of my favorite examples is looking at customer retention and return. You can do it simply without any explanatory variables at all. The irony is that if you bring in explanatory variables, in many cases the model will do worse. This makes managers crazy. They need to know why these people are different. But if you’re bringing in explanatory variables that aren’t really capturing the true underlying reasons for the differences, then you’re just adding noise to the system. Your ability to come up with an accurate forecast for each group might actually be worse.

So you use data mining to help you figure out why those propensities exist.

That’s right. The key is to explain the propensities-the tendency to do things-as opposed to the behavior itself.

You said these models can be built in a spreadsheet. It doesn’t sound like you have to be a high-powered Ph.D. to create them.

Of course, that never hurts. But yes, these models are far more transparent to managers because the stories they tell are simpler, the demands on the data are far simpler, and the implementation is much easier. So what I like to do is to start people out with some of the really simple models and get people hooked. Show me how many customers we’ve had in year one, two, three, four, five, and I’ll tell you how many we’ll have in year nine and ten before we even bring in all the explanatory variables that data miners want to do.

If companies move to using models more, what data can they stop collecting and what data will they still need to collect?

Ultimately, what matters most is behavior. That shouldn’t be a controversial statement, but a tremendous amount of the data that’s being collected is nonbehavioral. Data on demographics, psychographics, socioeconomics and even consumer attitudes can not only waste servers and storage space but can actually make the models perform worse. I have lots of examples of data that leads to tremendously misleading inferences about what really matters.

So behavior’s what matters most, and even then you can often summarize behavior in very simple ways. For instance, in many cases we find that you don’t even need to know exactly when each transaction occurred to make forecasts. Simply give me summary statistics, such as frequency. Just tell me when was the last time they made a purchase and how many purchases they made over the last year, and that will explain pretty much everything worth explaining. You mentioned that a CIO Insight survey found that the amount of customer data companies are collecting is increasing at an annual rate of about 50 percent. I would claim that most of that 50 percent is completely wasted. It’s one thing to have 50 percent more data, but you’re certainly not getting 50 percent more knowledge or insight. In fact, you could be doing more harm than good, because you’re crowding out the few variables that really do matter.

What companies have done a good job of using models this way?

I wish I could put some companies on a pedestal, but I’ve never seen a firm really embrace this stuff as fully as I’d like. And I’ll tell you why: It’s really my fault. It’s the fault of academics who spend almost no time teaching these procedures. Most firms just aren’t getting exposed to this stuff.

What should CIOs do to help their companies use analytical and modeling tools appropriately?

For one thing, remember that more is not necessarily better. CIOs often push back on analytics because of cost, but if someone could give them all this additional data for free, they’d take it. That’s often wrong. Additional data can actually harm you because you’re going to start capturing random, quirky, idiosyncratic things that aren’t related to the true underlying propensities. The flipside is that a few simple measures that have been around forever, like recency and frequency, are all you need. If you can use data collection technology to get those measures more accurately or on a timelier basis, then maybe it’s worth the investment. Second, remember that some surprisingly simple models can take you incredibly far if you’re willing to not worry so much about drivers. Don’t bother looking for the drivers; first, capture the behavior. So start simple; that often means start in Excel. You’d be amazed at how much you can accomplish without even having to leave the spreadsheet environment.