Why Isn't Predictive Analytics a Big Thing?

Originally Published: February 23, 2009
by Eugene Asahara

I'm baffled as to why Predictive Analytics still has the status of a fringe, niche, and advanced realm of the Microsoft Business Intelligence world. Business Intelligence without Predictive Analytics is like a bus taking you most of the way home from work, but dropping you off on the main drag leaving you with a two mile walk. Those two miles are doable and it is great exercise, but what if you're short on time, ill or injured, it's pouring rain, or it's full of dangerous drivers? The bus may have taken you 90% of the way home in terms of miles, but that last 10% of miles is really half the journey in terms of time and effort.

Predictive Analytics as a whole is the discipline of formulating insights or forecasts based on statistics garnered through raw, historic data. This is essentially what human BI users ultimately do manually. Highly skilled analysts sift through oceans of data looking for patterns such as correlations, associations, similarities, and dependencies which are the elements of decisions.

Data Mining algorithms shipped in SQL Server Analysis Services since SQL Server 2000 provide functionality for automating, or at least semi-automating, this process. At this time, computers are able to recognize certain patterns, especially straight-forward patterns in an overwhelming amount of data or a tricky "background" (like "Where's Waldo"), better than humans and vice versa. The human's ability is more robust and versatile, on the other hand, and thus able to recognize fuzzier, more complex patterns.  Team up this "machine intelligence" and "human intelligence" and you have a smarter information worker.

Figure 1 depicts a tank commander1 who must take action based on decisions which are based on data gathered from various sources and angles; his own senses, radar, satellites, his commanders, other tank commanders, etc. Without Predictive Analytics, the Tank Commander must expend more effort and time in order to make well-informed decisions. If he had all the time in the world and making a mistake didn't matter too much, this would be fine. Unfortunately, the tank commander is making life and death decisions against smart enemies who are also making decisions against him. Whoever makes the best decision first wins.

Figure 1 - The chasm between the OLAP cube and the Information Worker.

The life of the tank commander's counterpart in the corporate world, the line manager, isn't quite as mission-critical, but the analogy carries over. American corporate culture reflects military culture with a hierarchy of ranks and battles and strategic alliances with other corporations. The smartest, best-equipped, most optimized, most agile, most highly-trained, most strategically aligned corporation will win.

Notice that data mining as presented in Figure 1 bridges a chasm between data in an OLAP cube and the intelligent tank commander; it takes you that last step of the journey, which is often the hardest part of the journey. Also notice that data mining doesn't make decisions, it just assists with making decisions.

Data Mining is NOT Artificial Intelligence

Earlier I highlighted the word, "recognize", to point out that Data Mining as the term applies in the Microsoft BI world stops at the point of "recognizing things". That is, "recognizing things" is just the beginning of the process of strategy execution. Let us step back a little and think about life on Earth from the point of view of the interplay of species of creatures fighting towards the goal of "survival of the species" or even the interplay of corporations each fighting for the goal of success (profit, growth, dominance, etc).

From that point of view, the purpose of intelligence is to formulate a strategy towards those goals then to execute that strategy. Execution of a strategy takes this form:

              Strategy<--Execution<--Action<--Decision<--Recognition

Execution of a strategy consists of actions of all sorts. Actions executed by executives have broader scope for the corporate "creature", whereas actions executed by peons such as myself have more local, specialized scope. Actions are actually physical manifestations of decisions we arrive at in our heads.

Decisions are the result of a process of logic. The logic could be simple as in a "no-brainer' decision. Or they can be complicated, taking into consideration dynamic, ambiguous factors. Such decisions are generally beyond the capability of software today and that is not what Data Mining is about.

The factors that are "plugged" into our decision process are "recognitions". Recognitions are values and are simpler concepts decisions, actions, and strategies. Recognitions (values), like decisions, are the result of a process, but the process for recognitions are well-defined rules. For example, a recognition of "an up-tick in sales" is the result of summing sales for today and yesterday and determining if today is greater than yesterday. Because the rules for recognitions are well-defined (there is no ambiguity about the rules), they are programmable. Microsoft's Data Mining algorithms are programs recognizing patterns by well-defined rules.

In the example illustrated in Figure 1, the President determines the goal, the generals formulate the strategy, the junior officers execute the strategy, the soldiers (the tank commander) takes action and makes decisions, and the data mining algorithms recognize things that are hard for him to recognize in the timely and accurate manner.

In our civilian world, the quintessential Information Worker, a medical doctor, follows this pattern as well. Symptoms are recognized, a diagnosis is decided upon, and a treatment is set into action. This holds true for all Information Workers and is the basis for Performance Management.

So Again, Why isn't Data Mining a Big Thing?

As I mentioned, the data mining features of SQL Server Analysis Services have existed since SQL Server 2000. Why is it still so shamefully under-utilized? My opinion is that the perception is it is a high-end skill requiring extensive training beyond the capability and desire of the vast majority of people. In other words, only the really smart and geeky people, the people we either can't be or don't want to be, can do this. So that relatively small audience makes Predictive Analytics a niche, fringe realm, which is better served in niche, fringe scenarios by focused players such as SAS and SPSS than a general player like Microsoft.

On the other hand, maybe it's not a matter of wondering why Predictive Analytics isn't bigger, but rather, why isn't Microsoft selling way more BI? BI without Predictive Analytics drops the customer off at "Here are your reports. Analyze away!" With all the hype around BI, I can see that it could sometimes leave a customer a bit underwhelmed.

There is also a sort of "Goldilocks and the Three Bears" syndrome. On one end of the spectrum, there are the folks at the university labs I've worked with who laugh at our data mining. Their requirements are quite sophisticated and our "<<fill in the blank>> for the masses" approach doesn't yield the right tool for them. On the other end of the spectrum are those I mentioned a couple of paragraphs ago who think of data mining as something only smart and geeky people do. Can we effectively implement Predictive Analytics without a PhD in math and an MBA?

BI is IT Brain Surgery

Microsoft has done an incredible job of making the tough task of getting a computer to recognize and highlight patterns from a chaotic ocean of data pretty accessible. See for yourself by playing with the tutorials. But many folks are scared of DATA MINING and therefore shy away from it, both implementers and potential customers. Thus there currently isn't a critical mass of consultants, architects, or engagement managers or demand from customers willing to take the leap into data mining. I think most recognize the potential. But there is a chicken and the egg problem with a lack of skill to implement a critical mass of successful predictive analytics engagements and there isn't a demand from customers in part because there isn't a critical mass of talented architects and consultants.

However, this lull in the economy presents an opportunity to break the chicken and the egg cycle. Idle consultants and architects can prepare for the eventual upswing, which will usher in fun IT things like Web 3.0 (Semantic Networks!) and the big leap to parallel processing as a mainstream development skill (if you old VB6 or COBOL programmers thought the leap from "procedural" to object-oriented was big ...).

Analysis Services abstracts much of the ugly math and statistics from you. What is left for the successful Predictive Analytics consultant is to possess a great deal of business savvy and common sense. Additionally, even though a predictive analytics project may fall under well-defined categories such as Risk Analysis and Customer Churn, each implementation is at least somewhat unique to each enterprise. Therefore, a practitioner needs at least a good level of a software developer's mind.

Is this going to be hard? Of course it is. BI is IT brain surgery. BI integrates data from all multiple data sources for an all-encompassing view, applies that integrated data to rules, executes actions, and waits for the feedback results in more data. That sounds like what the brain does to me.

The phrase, "It ain't brain surgery", suggests we think of the brain surgeon as the smartest of the smart (all doctors have to be smart). In the case of the BI consultant, we may not necessarily be the smartest of the smart (the TV character, "House", isn't a brain surgeon, but he is smarter than everyone), but it's hard to argue we face more ambiguity and need for high-end capability in a range of skills than any other IT discipline I can think of.

As an IT consultant I wish for tools that will make my job easier, but I also pray that the field continues to evolve at a snappy pace so there is room on the cutting edge requiring a high level of experience so I can continue to add significant value while I make a fairly decent living. With the growing number of product group and subject matter expert blogs, early-adopter programs like TAP and CTPs from the product groups, fantastic on-line tutorials (free or purchased for really practically nothing), Wikipedia, inexpensive off-shore support, etc, it becomes increasingly hard for a field consultant to add the level of value that justifies the sort of rates the customer is charged.

Predictive Analytics requires more than "best practices" packaged in cook books. Every company is unique and hopefully operates on a unique strategy (otherwise, how can it compete?), which in turn means Predictive Analytics manifests itself differently. It is like software development where the requirements are unique (otherwise why wouldn't you buy an existing package instead?). There is certainly a core of skills requiring a high degree of competency and there are common patterns, but usually no readily reusable solution. Predictive Analytics is at the cutting edge where thinking (so much harder than merely applying) reigns supreme.

Now that data warehousing and even ETL and OLAP ("Gemini" will make my high-end Analysis Services skill a little less needed) are maturing, Predictive Analytics opens the door for BI to continue requiring highly skilled practitioners with the ability to improvise for a long time to come. Customers will require increasingly intelligent automated assistance to wade through oceans of data because of further integration of systems around the world, the sheer increase in volume of wired people and data, the addition of potentially magnitudes more mobile-embedded devices and RFID tagged items onto the Internet, and the more we ask from our computers.

For those consultants who have tried to enter BI but were overwhelmed by Analysis Services OLAP, particularly optimization and MDX, Data Mining may prove to be a more feasible entry point into a high-end BI skill. This is particularly true for those from financial backgrounds as I write about in my blog, Data Mining in the PerformancePoint MAP Framework.

Broad and Simple Deployment of Predictive Analytics

What will open the flood gates for Predictive Analytics, at a "data mining for the masses" level, is to point out where Microsoft's data mining is best suited, at least for now. And that is when it is deployed in a broad and simple manner.  That is, predictive analytics is embedded in many places throughout an enterprise support robust, dynamic requests for data that are simple enough to develop and maintain without a staff of PhDs.

Figure 2 depicts how a broad deployment of embedded predictive analytics throughout an enterprise will significantly expand the world for Microsoft Business Intelligence. The upper box, colored in navy blue, shows the Microsoft BI world today without Predictive Analytics as a common component.  We see the end result as reports and visualizations such as PerformancePoint dashboards and Excel charts. (Of course, there are many other reports and visualizations such as Reporting Services reports and OLAP cube browsers such as Excel Pivot Tables which aren't depicted here.)

 

Figure 2 - Microsoft's Data Mining is uniquely suited for broad and simple implementations.

The lower box, colored in green, shows a whole new world for Microsoft BI as broadly implemented, embedded predictive analytics. Data Mining models are created from the Enterprise Data Warehouse (EDW) and feed robust, versatile data directly into the enterprise's applications. (Figures 4 and 5, later in this blog, depict the many scenarios for such embedded predictive analytics.) BI as most customers have it implemented today, that is without predictive analytics (the upper box in Figure 2), was always about predicting the future based on the past.

Rather than focusing on the common scenario of a high-end analyst with her PhD in math and MBA solving very complex problems, where SAS and SPSS software is utilized, we focus on one of the "data mining for the masses" approach. That means data mining models provide simple, relatively straight-forward predictions to applications throughout an enterprise.

The BI "Stimulus" Plan - BI Meets Business Process Management

Microsoft's easy to use data mining features combined with Microsoft's wide array of integrating technology (such as ADO.NET, XML/A, WCF, SSIS, and Linked Servers) enables "Data Mining for the Masses". Except the "masses" can be software applications as well as information workers across an enterprise. Because software applications aren't as smart as humans, simpler decisions can be delegated (or semi-delegated) to software across an enterprise.

Figure 3 depicts an architecture that goes a long way towards raising IT to the level of "Strategic Asset" in BPIO terms. On the left side of Figure 3, we see two examples of business processes, the sales cycle and manufacturing. In the center, we see an EDW (Enterprise Data Warehouse) capturing metrics from various points in the processes, usually where handoffs are made (such as sales to shipping). From that EDW, data mining models are created which will be queried by the business process application for robust, versatile answers. That forms a cycle (red arrows --> green arrows --> purple arrows) that utilizes BI data to directly to drive the actions of the business.

 

Figure 3 - BI and Predictive Analytics raises an IT system to the level of "Strategic Asset".

Towards the upper-right of Figure 3, we also see a mechanism for monitoring the effectiveness (performance) of the mining models; the predictions are compared against the eventual actual value. After all, if the Predictive Analytics will actually contribute to decisions (autonomously or semi-autonomously) its performance should be monitored within a Performance Management framework as would any other Information Worker.  The blue arrows form a cycle of Monitor, Analyze, and Plan.

With this in mind, software developers should begin to develop applications with Predictive Analytics in mind. They need to think in terms of probability in addition to absolutes, non-linear thinking in addition to linear workflows, heuristics in addition to algorithms, and dynamic models in addition to static graphs. Instead of stating there is a projected 3% rise in profits for the next quarter, we say "There is a 3% mean rise with a standard deviation of .25."

Figures 4, 5, and 6 focus in on different aspects of Figure 3. Figures 4 and 5 list some of the many examples of Predictive Analytics for the enterprise applications I included. Most of the items listed in Figure 4 should be pretty familiar to you both as an IT consultant and as a consumer. It is said that Artificial Intelligence is here today, but not as it was expected in the 1980s. You would hardly notice it. It arrived in the form of a collection of simpler gems of "intelligent" tasks distributed across all aspects of IT not like C3PO, HAL or the Terminator. It checks your credit card transactions for fraud, keeps your favorite item on the store shelves, sorts out your junk mail ...  

Some items listed in Figure 4 are simpler than others, but all of them are very much feasible. That means Predictive Analytics can be implemented in baby steps. Predictive Analytics can be implemented iteratively minimizing the chance for failure. That's an important point. We're not even really out of the days where even run of the mill BI projects are notoriously prone to failure. On one level, attack specific, digestible opportunities for Predictive Analytics, but on another level, look at Predictive Analytics as a whole as a paradigm just as much a part of enterprise software as its ability to network and run tasks in parallel.

The items listed also return significant value on the investment. Return on investment can take many forms, for example:

 

Figure 4 - Predictive Analytics for the Sales Cycle Applications.

The essay, Predictive Analytics Drilldown, shows how the "fuzzy" data of Predictive Analytics surfacing on software dialogs can be associated with a mechanism that drills down into the guts of the prediction.

Decoupled Logic and Procedure

Figures 4 and 5 suggests a partial decoupling of logic from procedure, which means that logic can be changed independently of a compiled application. Such a capability goes a long way towards promoting the agility of a system. In this case, The applications shown in Figure 4 (Sales, Finance, etc) require data and make calls to a database as usual. The difference is calling data from a data mining model as opposed to a relational database is that the answer from the data mining model is the result of a complex statistics-based algorithm that provides a "best guess under the circumstances".

Note: A call to a relational database could be a call to a stored procedure which contains its own logic as well. That means the logic in the stored procedure can be changed independently of the calling program. The difference is that the stored procedure contains custom logic written and maintained by human programmers, whereas the data from the data mining model is the result of specific algorithms; the algorithm is constant, although the shape of the "training" data may change affecting the results.

Imagine how frustrating life would be if every question you posed to a human resulted in either the answer or "I don't know". I very much prefer people who offer an educated guess if they don't know the exact answer to a question. They will also tell me how confident they are and provide background on how they formulated the guess. They may also provide more than one educated guess. Think about all of the decisions you make every day. You'll see that many, if not most, are educated guesses you start with (hypotheses) and iteratively confirm (through research and experiment). The growing service of our IT applications is limited if we cannot expect them to serve educated guesses as well as indisputable facts.

Imperfect Information Mechanisms

Of course, the drawback of educated guesses is that they may be wrong. So we must consider options on what to do with these educated guesses provided by predicted analytics:

The next step after Predictive Analytics is to address this "iterative recognition" process we use when presented with educated guesses. "Iterative Recognition" and other such concepts such as "Decoupled Recognition and Action" are what I call "Imperfect Information Mechanisms", which I discuss on my Soft-Coded Logic site. Looking back at Figure 1, these concepts yield yet another layer that lives between the Predictive Analytics mining models and the Tank Commander, further adding value down this "data processing chain".

Predictive Analytics begins to address that we live in an unimaginably complex world where its actually hard to think of examples of decisions made on "perfect information". Practically every decision we make in real life includes factors for which we are uncertain about its value ... or we just trust the value out of habit. This is contrary to our current IT systems in general that can only handle perfect information. Any "exceptions" (discovered through "validation" routines) in our IT systems are diverted to a human to manually track down the perfect answer. These concepts come into play in very advanced usages of Predictive Analytics such as Risk Management/Assessment.

The human brain is an "imperfect information machine". If our human brain didn't evolve as an imperfect information machine, we would be as un-self-aware as the other animals with no capability for influencing our future. Ubiquitous assimilation of Predictive Analytics into our IT systems is the first major step towards relieving the 100% burden of handling imperfect information off humans. As I alluded to earlier, Predictive Analytics is already here utilized by big companies who can afford it or niche companies where it is core to its business. "Data Mining for the Masses" is there for the taking by the SMB market. 

 

Figure 5 - Predictive Analytics for the Manufacturing Plant.

 

Figure 6 - The effectiveness of Predictive Analytics monitored in a MAP framework.

Summary

So let me list the main take-aways as succinctly as I can. For the customer:

For the consultant/architect:

  • Predictive Analytics is a field where the better you can think, the more valuable you are. Business savvy gained over years and years of exposure is fully utilized as well.
  • It is a field that will continue to mature for decades to come; at least until autonomous robots are walking among us.

    A Few Parting Thoughts

    Whether you're a customer or consultant, I hope I got you to think a little of the "what" and the "why" regarding Predictive Analytics. That will lead to the "how" questions which I will address in future blogs. However, remember, this entire blog is just the opinion of one Microsoft BI practitioner attempting to inject more life into Business Intelligence and maybe raise the level of innovative spirit.

    Take my advice with a grain of salt. Although I'm a BI Architect in the MSC BI Global Practice, as they say, "The views expressed here don't necessarily reflect the views of the Microsoft BI leadership". To some people, what I just wrote about is obvious, but you are in the minority, which is the problem. To some, this sounds like pie in the sky, and I hope to have changed your mind. To some, you believe in the value, but were just afraid, and I hope I gave you the inspiration to tackle it and a plan of attack.

    What I do know is that the world is spinning faster and faster. If this merry-go-round crashes, it will crash spectacularly. We will need our computer systems to assist us in ever increasingly intelligent fashion to help us avoid those spectacular crashes before they happen. I purposely phrase "spectacular crashes" as I realize our systems need to help us engineer "controlled crashes" for reasons similar to why we have "controlled forest fires".

    That is where the high-end skills, the thinking skills that separate the "BI doctors" from the "BI Technicians", will be needed. That's where I plan to be as I haven't yet succumbed to the overly-engineered McGoogle way of operating that certainly plays a necessary role, but doesn't inspire much passion in me. My motto is, "I make my living at the cutting edge and my hobbies are at the bleeding edge." Now, the price to pay for living on the cutting edge is that you end up feeling stupid a good deal of the time. That's not for everyone, and it's often not for me as well. But it gets me up in the morning excited about what awaits out there.

    In the BI classes I teach, I tell the students that BI is a tough subject and it could take months or years to get to a point in this broad, ambiguous, highly technical field where you feel like you have a handle on it. But it is worth pouring your heart and soul into it because BI marks the beginning of how humans and machines will begin to relate more in a partnership than as human/tool. I wrote a little article on this over the recent Holiday season titled, The Socialization of Business Intelligence.

    Helpful Stuff:

    Business Modeling and Data Mining, Dorian Pyle - This is the best book out there for teaching you why data mining is such a valuable thing as well as describing how to attack a data mining engagement. I put this book up there with Ralph Kimball's classic The Data Warehouse Lifecycle Toolkit (this book's Data Warehousing counterpart) and the late, great Ken Henderson's The Guru's Guide to SQL Server Architecture and Internals (the relational database counterpart). So this book is best for experienced BI consultants who wish to branch into Data Mining.

    Super Crunchers: Why Thinking-By-Numbers is the New Way to be Smart, Ian Ayres - This is an excellent introduction to the "data mining way of thinking". If you have no idea what this blog is about, this book will convince you that data mining skills will be imperative. I reviewed this book a couple of years ago when it first appeared.

    Statistics for Dummies, Deborah Rumsey - Statistics are the basis for Microsoft's Data Mining. You must start with a good foundation. In all honesty, I've just flipped through this book (since I already have a good enough knowledge of statistics) in order to see if it is sufficient and easily digestible, which it is. There are many books on beginning statistics, and this one seems better than most.

    The Black Swan: The Impact of the Highly Improbable, Nassim Taleb - This book will help you to better understand how statistics works in the real world and equally important, where are its limitations. I highly recommend this book as a way to further wrap your mind around a data mining mentality.

    Predictive Analysis with SQL Server 2008 and Predictive Analytics for the Retail Industry - These white papers from the SQL Server CAT team offer a good place to start.

    Analysis Services Data Mining Mental Blocks - A little blog I wrote on my personal blog site discussing the mental blocks I see in many folks I encounter as I discuss data mining.

    Bootstrapping your Business Acumen: The "technical" folks reading this may be rusty with the business acumen required in order to effectively use Predictive Analytics to improve business processes. So here are a few books that I actually enjoyed. What the CEO Wants You to Know, Ram Charan - A very nice, concise book that cuts through all the fancy stuff, revealing the core of what business is all about. The 10 Day MBA - This is an excellent boot-camp book. The Agenda: What Every Business Must Do to Dominate the Decade, Michael Hammer - Unfortunately for Michael Hammer, he was hoping for the decade of 2000, not 2010.

    Notes:

    1 The picture of the tank commander is a watercolor by my wife, Laurie Asahara. It won 1st Place in Portraits at the 2008 Idaho State Fair.

    Comments

    # re: Why Isn't Predictive Analytics a Big Thing?

    Here is some feedback from one of my colleagues too shy to comment directly to this site (I posted this with his permission):

    [Colleague] In order to become the next big thing we need lots of wins.

    At a high level, for a customer win we must have:

    1 - a good fit and build

    2 - rock star tank commanders who can (are there talented people onsite?):

    Take ownership after the consultants leave

    Sell the solution up and down the chain

    [EugeneA] Information Workers, workers who use IT-supplied data, range from highly-skilled such as a Tank Commanders to not so highly-skilled like a Barista who determines whether to give me my latte for free because they took too long. Some will find data making a bigger impact on their activities than others. The best ad EVER for BI is that Sprint/Blackberry commercial (http://www.youtube.com/watch?v=fN8qorQLz2M) running today, "What if Delivery People Ran the World?" It shows workers of a "FedEx-like" company as school staff, teachers, bus driver, hall monitor, etc, and how they use their mobile devices to discover a student is missing, track down a truant, and convey what to do with him. They aren't using Predictive Analytics (that probably wouldn't have made as good a commercial), but it shows the potential for the wired Information Worker. My point is IWs don't need to be rock stars, but it helps to have rock-star data modelers - the consultants for whom this blog is written. The point of my blog is to convey the incredible value of Predictive Analytics to customers and to demonstrate to consultants or other IT-savvy Information Workers that Data Mining Modelers are the rock-stars of the IT future.

    [Colleague] Number 1 takes a lot of consulting/internal talent and hard work to set up but it is somewhat controllable.  If the tank commanders have proper intelligences (data and rules) in order to be successful in their mission (deliver timely info).  This leads to happy days lives (paycheck), and heavy reliance upon the system.  Some stress factors that keep tank commanders up at night:

    1. crappy intelligence - bad reports

    [EugeneA] Predictive Analytics can help an IW detect the probability of whether a piece of information is suspect or not. Look at it this way: Purposefully crappy intelligence is what crooks are all about and one of the applications for Predictive Analytics is to determine if someone may be a crook (ex: fraud detection).

    [Colleague] 2. delays - missed deadlines

    [EugeneA] This particular issue is not a matter for Predictive Analytics, but the next step - which I deal with extensively in various articles I've posted on this blog site and www.softcodedlogic.com. Missed deadlines is one variety of events that changes workflow. Predictive Analytics sets the stage, though.

    [Colleague] Most tank commanders are like paid mercenaries.  Paychecks are king. The solution must be a good fit and well built.  So much goes into this and you have outlined this need quite well.

    [EugeneA] Again, the Tank Commander is an Information Worker, not the data modeler (a consultant or someone on staff) who learns what will make a tank commander more successful and builds data models to provide that information.

    [Colleague] Number 2 is more a crap shoot.  Key individuals can make all the difference.  Incompetent people can ruin even the best implementation and in an embedded predictive analytics model the potential for incompetency exists across the globe.  There are lots of moving parts.  

    [EugeneA] One of the goals for Predictive Analytics is to address the fact that the real world does consist of many moving parts. Hopefully, Predictive Analytics embedded wherever one part meets another can inform us of the best course of action or warn us of something that may slip our minds that really shouldn't.

    [Colleague] It's a complex world so no matter how well built, we cannot anticipate everything.  

    [EugeneA] We can't anticipate everything, which is what statistics-based Predictive Analytics addresses by exposing patterns that guides us towards recognizing what is going on. In data mining we speak in terms of probability, standard deviations and confidence in the predictions. A data mining prediction is just a guess based on past history; a pretty good starting point. However, a figure accompanied by a high standard deviations tell us to still be on the lookout for something else. If we could anticipate everything, that means the rules are well-defined and we could write a conventional (albeit perhaps complicated) program. In the IT world, our systems are protected from the vagaries of the outside world like an over-protective parent. Under such conditions, we can anticipate everything.

    [Colleague] Maybe with all the moving parts, it's tough to package and sell?  

    [EugeneA] Predictive Analytics doesn't really add many if any moving parts; mostly, it makes existing parts "smarter". The moving parts are already there in the Business Process, hopefully integrated with EAI (Enterprise Application Integration) pieces. The "SOA Play for BI" (Figures 3 and 4) scenario simply takes measurements at the already existing EAI "handoff" points. Or, Predictive Analytics builds on top of an existing BI system where the tough task of integrating data is already done (as Figure 2 shows). Each item listed in Figure 4 can be implemented independently.

    Tuesday, February 24, 2009 8:24 AM by EugeneA

    # re: Why Isn't Predictive Analytics a Big Thing?

    M says: Great article.  What are your thoughts on what Sharepoint can do to aid in delivering "predictives"?

    EugeneA says:

    Thanks!

    As far as the "Tank Commander" (operational usage by front-line information workers) sort of use of predictive analytics, I'm hoping SharePoint plays a similar sort of role as it does in Performance Management (PM). Which is to say for PM, KPIs are delivered in a widespread manner to Information Workers. The difference is that Predictive Analytics values would appear on more of a "cockpit" display than a scorecard. Predictive Analytics values are values just like those from a regular relational database, except they are educated guesses that come with extra stuff; standard deviation and probability. That's not so different from the notion of a KPI is just a value with other stuff as well; goal, trend and status.

    Similar to the display for a KPI which uses a red/yellow/green traffic light as a top-level display (to inditcate the KPI's status), the top-level display ("confidence") of a Predictive Analytics value (an educated guess) would be a "normal distribution curve", the familiar bell curve. The width of the bell curve indicates the distribution of the possibilities (the narrower, the more confidence).

    Similiar to how KPIs utilize secondary displays such as line graphs to illustrate the trend over time leading to the current status of the KPI, Predictive Analytics would employ secondary displays such as scatter charts illustrating relationships between two points of data incorporated into the prediction. Both KPIs and Predictive Analytics could use OLAP to analyze breakdowns to pinpoint causes.

    Additionally, there is the notion of "Predictive KPIs". For example, a predicted actual against a target at the beginning of a period (before actuals start accumulating). This helps you to validate the feasibility of your set targets.

    The reason why I'm pointing out analogies between KPIs and predicted values is that in real life, there isn't such a thing as a scalar value. Every measurement comes with qualifications.

    Friday, February 27, 2009 7:23 PM by EugeneA

    # re: Why Isn't Predictive Analytics a Big Thing?

    [Rick Schultz] Eugene,

    Excellent article.  I like how you portrayed many of the underlying relationships in the BI world.

    I think my current experiences may shed some light for you in this regard.

    First off, remember that Microsoft sells primarily into the SMB space.  They have a much different understanding of what they "need" than the (usually) more sophisticated users at a large company.

    [EugeneA] I'm hoping that the SMB market is ready to look at Predictive Analytics in the same "BI for the masses" mentality that the OLAP side of Analysis Services presented about a decade ago. Analysis Services' OLAP and Data Mining were targeted at the SMB market. That is to hopefully help give them some parity with the Big Boys.

    However, my particular role does result in my spending more time in the larger businesses, where I believe both Analysis Services OLAP and Data Mining are up to the task.

    [Rick Schultz] That said, here are 3 projects I'm currently involved in.  The experiences are in-line with what I normally see in my career:

    1)  One customer has me developing reports, and adding a bit of automation.

    According to senior management, they are one of only 3 significant providers in their industry, and the current economy could help them rapidly take over number one spot (and almost monopolize the market).  At the same time, senior management has NO visibility into many of the ad-hoc metrics they need to act on this opportunity.  I can't even get them to take the time to write down the kinds of analyses that they want - they're too busy.  I refer to this as "can't see the forest for the trees"; too many business execs are too busy treading water to focus on the fuzzy shape of land on the horizon.

    [EugeneA] "The fuzzy shape of land on the horizon" ... I like that! This scenario of jumping on opportunities is certainly the most glamorous. It probably will end up that for the most part only companies (big and small) in a business that is by nature highly dynamic (like hedge funds or the CIA) will ever end up actually doing oppotunistic predictive analytics such as this.

    [Rick Schultz] 2)  Another client is trying to determine if they'll implement a system they asked me to scope.  For less than one annual salary, they would have a complete system to, effectively, run their company.  Early estimates are productivity gains of 50% or more.  This customer wants to grow by more than 100% over the next 5 years.  However, they're coughing over the initial outlay necessary.  It's funny - they will likely spend much more than that outfitting a physical warehouse, which only adds productive capacity, rather than kitting out a data warehouse which multiplies capacity.  This is another common issue - smaller company managers see costs at a personal level, and spending tens of thousands of dollars on something you can't even really touch is just darn hard to figure out.

    [EugeneA] This scenario helps me to understand that my efforts in the MCS Global Practice needs to place more emphasis on conveying ROI. The ROI that Predictive Analytics presents can potentially be astounding and revolutionary. Of course, those sort of claims raise red flags as well :)

    I know that's all easier said than done, but again my challenge is to hit it the obstacles in the way of pervasice PA in the MSFT world head on.

    [Rick Schultz] The third issue is a bit different.  I'm developing my own software solution right now, a BI tool that extends Dynamics GP.  We're getting close to the end of the development, and my developers are bright, capable, and experienced.  However, like most developers you'll find in the SMB market, they don't have a lot of direct business experience, and some of the requirements are very hard for them to grasp.  We spent over an hour discussing a very minor point that would be intuitive to a user because they don't have that mindspace available to them.

    [EugeneA] I'm hoping that "recovered financial/accounting" people (generally with more years under their belt as well) who have the deep business savvy that can be learned only through years of experience can learn about the technical side of predictive analytics. If the tools are simple enough and the methodologies adequately outlined, this is feasible.

    [Rick Schultz] So, in a nutshell, the things that keep back adoption of data mining/predictive analytics are:

    1)  Management too busy to think through the business in order to direct the developers to use PA,

    2)  Management has trouble accepting the long implementation & high cost for something they can't touch,

    3)  Developers struggle to understand (and therefore model) many of the finer points of the models they need to build.  Part of the problem is that those parts are often intuitive to the end user, so the end user doesn't really know how to explain in sufficient detail (and using the right

    vocabulary) to pass the knowledge on to the developer.

    These are all people problems and, until they are addressed (by the people themselves!) PA and other forms of advanced BI simply won't happen.

    [EugeneA] I couldn't agree with you more. Thinking through how to present ROI in predictive analytics terms, presenting predictive analytics as a competitive edge, etc is what I hope to make headway on over the next couple of months. The value of PA is indisputable, but there are mental blocks by both customers and consultants.

    Great insights Rick!

    Tuesday, March 03, 2009 7:58 AM by EugeneA

    # Imperfect Information

    I posted an essay I worked on this past weekend on my SCL site: http://www.softcodedlogic.com/imperfectinformationdirect.aspx

    The ability to handle "Imperfect Information" is the point of intelligence. Predictive Analytics is a step towards it, so I thought it may be an interesting read if you liked this blog.

    EugeneA

    Monday, March 09, 2009 9:25 AM by EugeneA