Data Mining in the PerformancePoint "MAP" Framework

For the last couple of weeks I've had some fun working on a rare (at least for me) data mining engagement. Such engagements are rare for me since the team I'm on focuses on PerformancePoint Server 2007 (PPS), which at this time does incorporate data mining in the PPS Planning application, but it certainly isn't a big part of the overall "MAP" (Monitor, Analyze, Plan) framework. The data mining functionality of Analysis Services 2005 is still currently viewed as a fringe (or very advanced) component of the Microsoft BI Stack by many practitioners. That's a shame since planning involves more than the business-centric forecasting and trending in the current PPS Planning application.

Data Mining offers for those with a strong business and accounting background, but not a strong IT background (such as accountants and especially the MBA types) who hope to consult on PPS implementations an angle by which they can provide high-end value. By "high-end value" I mean the sort of skills in the PPS world that take years to truly master such as the enterprise-class implementations of SQL Server 2005 Analysis Services (SSAS) and Microsoft Office SharePoint Server (MOSS) that will involve issues beyond what rote "best practices" white papers cover. (And I facetiously say, "For which you make the big bucks.") This is as opposed to the relatively easily obtained skill of learning how to build dashboards with the PPS Dashboard Designer or designing Excel BI reports.  Planning is about developing a strategy to resolve a problem.  It is about attempting to see into the future, which is tougher than analyzing the past.

The Monitoring and Analytics (M&A) pieces of the PPS MAP framework have existed for a few years in their former incarnations roughly as the "Business Scorecard Manager" and ProClarity. The Planning piece is brand new. Planning is the toughest (the "P" is MAP) piece to tackle from a software developer's point of view since planning (in the complete sense) is the most complex, variable activity. Therefore, the capabilities of the current PPS Planning application are kind of limited in scope to the semi-defined confines of financial budgeting and financial forecasting. 

Keeping an eye out for something that may be wrong (Monitoring) and then trying to figure out what is wrong (Analyzing) are simpler tasks than trying to figure out what to do about it (Plan). Figuring out what to do about a problem goes well beyond the financed-based capabilities of the PPS Planning application. For example, forecasting the sales of a newly developed product involves many steps and many insightful thought. Forecasting the sales of an existing product is feasible with enough history. But I would think a new product is unique in some way and what makes it new is based on a unique theory. One would start with determining the target customer segment using clustering, products that could induce similar behavior using the association algorithm, then using that information to infer a a sales forecast.

Planning can be for issues within closed or open systems. I bring up this topic because the engagement I was on involved a forecast in a rather open system, so I wish to offer this differentiation. Closed systems are much easier to work with as variables are tightly controlled. Closed systems are finite, therefore mapping relationships of aspects of the closed system is feasible. Examples of closed systems include manufacturing plants and most machinery such as cars. However, closed systems still don't live in a vacuum, so they are generally encased within systems protecting the consistency of the variables.  Think about all the systems that protect a car's systems from the vagaries of the outside world giving the closed system the illusion that it is truly closed. For example, the radiator keeps the engine at a consistent temperature. 

Open systems include the weather, environmental impact, and worst of all, customer behavior (which is the open system related to this engagement of which I speak). These are systems out in the wild with few or no systems protecting any mechanisms. Rules revealed through data mining will probably be different under even slightly different circumstances. Relationships would be virtually impossible to map and manage as they would be too numerous and complex. (I often tell this story to illustrate the impossible complexity of the world: How Quickly Things Become Impossibly Complex)

Now think about how complex the forecasting for a new product would be since results are ultimately based on customers actually ending up buying a product. In other words, it's based on customer behavior. How many factors can you think of that affect customer behavior? How many steps are required from the release of an advertisement to the actual purchase?

The big difference a planner will face between open and closed systems is that planning for open systems involve a high degree of "art". By "art" I mean that there is a high degree of versatile human intelligence (as opposed to the deterministic intelligence of most current software) required to decipher a complex system. It's unfair to use the phrase, "more art than science", in the contemptuous manner in which I hear it uttered. One must take care to differentiate art from chaos or superstition.

"Art" is actually the epitome of human intelligence. Art should be thought of as something non-deterministic that cannot be automated by computer or robot. Putting Jackson Pollack aside, all works of art (in the usual sense of "art") require a great deal of technical skill.

Generally there is less art involved with closed systems. Planning within closed systems can generally follow well-known techniques ... "best practices". There may be things that are hard to see that data mining can reveal. However, once the rules are revealed, they change only when the closed system changes, which is usually in a disciplined, controlled manner.

Planning open systems where there is unimaginable complexity and a lack of control is where versatile intelligence is required. There are at least four major categories of skill one must develop in order to plan in open systems:

  1. Statistics. Statistics is the foundation of data mining. "Data Mining" are techniques based on statistics.
  2. The use of the data mining tools and techniques. When do you use clustering or decision trees? How can they be used in concert? Two good books are Data Mining with SQL Server 2000 Technical Reference and Data Mining with SQL Server 2005. The first book is still good even though it's using SQL Server 2000 as the analogies are good.
  3. Data Warehousing - I use the term, "Data Warehousing" (which includes ETL), to describe the science and art (I list science first on purpose) of procuring data required for your analysis. The data mining practitioner's ability to procure data independently (as opposed to waiting for IT) gives them the necessary freedom to artistically data mine. How do you find, access, "massage", and store data? Mastering the "Data" tab of an Excel 2007 spreadsheet will make you a "Data Warehousing Lite" practitioner, and that may be good enough for 90% of the aspiring data miners out there for 90% of the time.
  4. Systems thinking. This is an about face from the compartmentatilized way that we normally attack a problem.

Systems thinking will probably be the toughest since the concepts are sort of fuzzy and not deterministic as we've come to expect from the "scientific", best practices way of thinking. My personal blog site lists my favorite "non-traditional" performance management books that speak to this topic of "data mining is planning" and systems thinking: Non-Traditional PM Books If I had to pick one, I would pick The Fifth Discipline, by Peter M. Senge.

Ultimately, when the install and troubleshooting applications for SSAS and MOSS reach a higher level of maturity (the tasks can be handled as a rote, automated manner - a commoditized, outsourcable skill), data-mining-based planning skill will be what's left in the MSFT Performance Management world that would consist primarily of the versatility of human intelligence and thus reward the practitioner as a high-end skill. A good read on this topic is Super Crunchers.