Ubiquitous Predictive Analytics
Predictive Analytics Drilldown

by Eugene A. Asahara
Created: February 5, 2009
Last Updated: March 17, 2009


If fuzzy data supplied by data mining models is incorporated on a wide basis into everyday software applications such as CRM applications, an omnipresent feature of a dialog window could be the command buttons with the "almost equal to" symbols (the squiggly equal sign) shown in Figure 1. That button would open a dialog allowing a user to explore composition of its associated value. Because the value of most data has some degree of uncertainty, it would be very helpful to be able to display statistically derived data (or "statistically decayed" data - data that may become invalid over time) along with what we normally consider data for which their values are certain. I call this action "Predictive Analytics Drilldown" and the button, the "Predictive Analytics Drilldown Button".

Figure 1 depicts a simple but typical CRM dialog assisting a salesperson in selling more books to customers. It consists of a few pieces of data identifying the customer and a list of titles a purchaser of "Super Crunchers" may also like.


Figure 1 - Predictive Analytics Drilldown buttons.

The list of books illustrates a typical cross-sale use of Predictive Analytics. However, the point of interest for this article is the Predictive Analytics Drilldown button that is next to the display of the customer's Name, Address, and Home Phone. Because this company hasn't had contact with this customer for a significant period of time (26 months), there is a possibility that these pieces of information are not correct. For various reasons people change their names, move, and change their phone numbers.

It would be very helpful for the CRM application to proactively tell me if there is a statistically significant chance that these events may have occurred. Notice that the address' Drilldown button is green and the phone number's Drilldown button is yellow? These non-black colors (red-->yellow--> green) indicate the probability that the value of their respective data may not be valid. For the address, the green indicates that it is most likely correct, but could be wrong. For the phone number, the yellow indicates there is a significant chance the phone number may be wrong.

The color of the Drilldown button for Name is black indicating certainty (or at least a calculated probability of 1 - geekspeak for "no doubt about it"). The button still appears to allow a human to drill down anyway since ... well ... the mining model could still be wrong (maybe somebody lied).

Remember, the CRM application is just presenting information to the user. It's still up to the user to decide what to do. This method of proactively informing a user of a chance data isn't valid minimizes the chance for "false negatives". In other words, if the user didn't know there is a chance the data is wrong, the user is unaware of a problem; ignorance is not bliss. Additionally, the color of the Drilldown button helps the user also understand the probability of the possible error. This helps minimize "false positives".

Note: True Positives are the goal of smart systems. The article, Data to Information to Data to Information, is pretty much about how SCL resolves the True Positives issue.

If the salesperson decides he/she needs to investigate the possibility for wrong data, the salesperson could drill down into the value of those identifying data, say the address, to see a probability of the address and additional information on how that probability was calculated. Figure 2 shows the Predictive Analytics Drilldown dropdown. In this case, the dialog is associated to the Address of the customer. This dialog allows the user to access a data mining model related to its associated value.


Figure 2 - Predictive Analytics Drilldown dialog.

At the top of the dialog is a combo box with each predictable attribute for the associated value. Here we see as the predicted attribute, "Average Length at Residence".  Just below that combo box is the prediction of the "Average Length at Residence", which is "5-10 Years". The salesperson can reasonably think that because only 26 months have passed since the last contact with the customer and the "average length at residence" for this customer is 5 to 10 years (at least 60 months), this customer probably still lives at that address.

The "Factors" listed in the drilldown dropdown expose the "rule" used to calculate the predicted value and probability. What is great about data mining models is these rules are adjusted when the data mining models are retrained. This provides a decoupling of logic and procedure, which allow rules to change automatically along with changing times. Looking at the factors, common sense would agree that it's reasonable that a educated, married man, with children, earning a modest income, would not change residences too frequently.

The user would use her "human intelligence" to weigh factors determining whether to take the time to do this. The factors involved in whether to drilldown on one or more of these fields include:

Another nice feature would be a semi-transparent distribution curve in the text box that will allow the user to quickly determine how the confidence in the value shown.