Data to Information to Data to Information (DIDI)

An SCL Solution to the Shortcomings of Business Intelligence Dashboards

 

By Eugene A. Asahara

March 22, 2008

 

Introduction

Business Performance Management Software (BPM – or just PM) brings monitoring dashboards to the forefront of the workplace. Information Workers (IW) at all levels and job types view different versions of dashboards, which have been tailored to their specific job description. Dashboards are UI views consisting of (1) a number of metrics that inform users of their current performance in light of stated goals and (2) supporting reports. The purpose of such dashboards is to keep the metrics that measure performance on the job “in the user’s face”.  Dashboards are an effective and efficient means to assure that important matters are not overlooked and that action is taken only on the basis of fresh and current information.

In business intelligence we often say we turn data into information, which is what today’s dashboards do so well. However, increasing complexity could increase the complexity of dashboards for certain workers to the point that all that the “stuff” being displayed devolves into an incomprehensible mess. We can say that we’ve turned information back into data. (You might other wise describe the dashboard dilemma with a “herein lies the rub” statement:  the rate at which the sophistication of systems and knowledge in the world increases creates a growing deficit within the brain’s rather fairly static learning capacity to keep up.)

This article is about recruiting some assistance from the AI world to then turn that mess of data back into information. I propose to use SCL to bring complex dashboards to at least the second level of SCL Intelligence, Robust Recognition.

Dashboards (Simple) and Cockpit Displays (Complex)

I categorize Performance Management “dashboards” into two types: simple, like the ones we have in our cars, and complex, like the ones found in the cockpit of a jet plane.  

Simple ones, which I will refer in this article as “dashboards”, the currently favored sort  in the Performance Management world, display a handful of key performance indicators (KPI) on a scorecard with a few supporting visuals such as bar charts or line graphs. These are the dashboards that employ styles such as the Balanced Scorecard and are implemented with software such as PerformancePoint Server 2007.

Complex ones, which I will refer to in this article as “cockpit displays”, contain arrays of graphs and gauges that provide everything IWs need to do their jobs. The displays are generally low-latency, some at or near real-time and most data no more than a few minutes old.  A good example of cockpit displays those used by database administrators to monitor the health of mission-critical database servers.  These complex displays depict arrays of graphs showing the measures of dozens of metrics such as Windows PerfMon counters and SQL Server Profiler events over time. In addition to the sheer number of graphs on these displays, some of the graphs themselves may be complex showing sets of points or lines combined with bars.

Dashboards

Dashboards have been designed to be easy to read. An IW can understand what is being said with minimal time and training. What is displayed is relevant to the current strategy of the IW’s organization (company or department) and targeted to the role of the user viewing the scorecard depicting high-level values upon which the user can drill down to for deeper levels of investigation.  Users of dashboards require only a periodic “sanity check” to ensure that their activities are on track with the plan.

Dashboards work very well when the IW’s tasks are executed in a well-defined, repeatable manner. In other words, the likelihood of unexpected and relatively rare conditions is relatively low and the result of failing to adequately handle such conditions is relatively benign. Such IWs include line workers. On the other end of the IW spectrum, dashboards are useful for high-level, strategic IWs (executives) where their actions are based on high-level summarizations. However, unlike line workers, failure for an executive to adequately handle unexpected conditions can be catastrophic.

Cockpit Displays

Cockpit displays , on the other hand, are difficult for non-specialists to comprehend. They certainly do look impressive, especially in a demo, but they require highly skilled IWs to wade through and comprehend the ocean of presented metrics within a short period of time (before the decision becomes a moot point). For such IWs, “short period of time” can mean anywhere from a second to a couple of minutes.

Cockpit displays are intended for highly-skilled, highly focused professionals such as airline pilots, doctors, and Homeland Security analysts where failure to successfully address unexpected events is costly. Not only do cockpit users require a high-level of skill in the subject matter portrayed by the display, but there may be a significant period of ramp-up time before the user is able to discern what “is being said” by the cockpit display.

It’s not that what is shown on the cockpit display is too much information as all of those metrics will play into decisions made by the information worker at one time or another, under both expected and unexpected circumstances. After all, if someone took the trouble to include it on a dashboard, someone thought it has value.  The rest of this article focuses the need for cockpit displays and their drawbacks (and how the drawbacks can be ameliorated).

The Major Shortcoming: One Can’t Fight What One Can’t See

The mantra of dashboards is, “What gets measured gets done.” For cockpit displays, the mantra could be “One can’t fight what one can’t see.” Further, “what one can’t see” takes three forms:

  1. What we don’t yet know. This is the form most familiar to us. How many ways can a day go bad? When we do find out another way a day can go bad, we simply add another rule so we aren’t fooled again.
  2. Being fooled on purpose. This is how predators sneak up on prey. A skilled predator essentially makes its prey think nothing is wrong (all KPIs are green) until it is close enough to strike.
  3. Lost in the haystack; information overload. A great example is the legal tactic of flooding your opponent with so much documentation that they cannot possibly assimilate it all.

The first two may seem similar, but there is a big difference. With #1, after getting hit on the head with a coconut while standing under a coconut tree, we learn to not stand under any tree with fruit. With #2, the enemy adapts. If the coconut tree were intelligent and knowing we’ve learned not to stand under coconut trees with coconuts, it would hide its coconuts, leading us to believe it’s safe to stand under it.

Both dashboards and cockpit displays each have their places. However, is a dashboard adequate for the CEO of a major bank who could cost millions of investors billions of dollars by not catching an unexpected disaster? What about an overworked air traffic controller barely capable of comprehending the positions of hundreds of airplanes? Failure for the CEO would be a matter of being trapped by false negatives; not being aware of disasters about to occur. Failure for the air traffic controller is a matter of being overwhelmed by too many false positives; being inundated with oceans of information beyond a human ability to quickly comprehend, what most people think of as “too much information”.

So how can we provide Performance Management portals for CEOs and air traffic controllers that adequately resolve problems of false negatives and false positives, respectively? I’ll begin to answer that question by presenting two notions.

What’s the Difference between Data and Information?

The first is to understand how I define data and information. There are three conditions that must be met for data to be information:

1.       We seek data or information for the purpose of taking an action. In other words, we take an action when each member of a set of conditions meets some value or is within some range of values. The need to decide upon taking action makes data useful, or rather is what distinguishes information from mere data.

2.       We have “data” and not information if we have no way to analyze it because the volume and/or complexity of the data are beyond the human brain’s and tool’s (analytics software) ability to comprehend it within a required period of time.

3.       The datum must be adequately aggregated. For example, a set of numbers is simply a set of numbers, each element meaningless by itself, until we graph it, study the graph, and conclude something like, “That graph represents growing sales”. For OLAP folks, this adds a new connotation to “aggregation”.

 (In case you’re thinking about the act of thinking, which involves no discernable physical action, we’re playing virtual what-if with symbols in our brain.)

Seeing is Believing?

The second notion is that we live in a complex world where outside the white walls of our well-defined, repeatable business processes, nothing ever happens exactly the same way twice. This is why the jet pilot’s cockpit display contains everything that’s available to show. We still need to depend on human intelligence to recognize things in a robust manner.

It’s towards cockpit displays that I mostly apply the term “data to information to data to information” (DIDI). As I mentioned in the introduction, in business intelligence we often say we turn data into information. An example would be plotting a set of numbers into a line graph clearly depicting something like rising or falling sales. We say that graph is now information. However, if we look at an extreme cockpit displays with say 100 such graphs, it’s overwhelming to our brains and all of those pieces of information (graphs) once again become data (data to information to data).

True Positives

With those two notions in mind, I’d like to present my solution to turning that data (the hundred graphs and gauges on a cockpit display) back into information. We’ll begin by reviewing the process by which the mainstream software vendors progressively incorporate “intelligence” into their software. An application to recognize bad things at airport security from a scanner is a good example since we’re all very familiar with the constantly changing and growing list of rules and the dire consequences for failure. Reviewing this process will place my proposed solution into context.

Note: Before continuing, I’d like to say here that software in development for the purposes such as automatically recognizing threats at airports would use very sophisticated components that the mainstream business application market doesn’t yet incorporate. My goal for this article and SCL in general is to bring such components to the mainstream. This example is intended as a demonstration of how mainstream tools would be programmed for this application.

The first thing I’d want my software to detect is, obviously, handguns. I know they are metallic and know they mostly adhere to a roughly similar shape. So, the software is programmed to detect metal objects with a pipeline appendage (the barrel), an oblong appendage positioned at roughly 70-120 degrees of the pipeline appendage (the handle), etc … A whole bunch of IF-THEN-ELSE statements are incorporated into the software to detect such conditions.

This programming is indeed effective at detecting handguns. However, terrorists quickly realize this and apply another tactic of using liquid explosives. The software is currently not programmed to look for this. At this point, the software fails to protect passengers due to a false negative. The false negative is rectified by programming in a set of conditions to detect these liquid explosives. However, the terrorists quickly adapt by employing some new tactic, again and again thing. We counter the terrorists by adding that condition, thereby removing that potential false negative.

After going round after round like this, we have two undesirable problems with our software. One is that there are so many things to check for. But that’s not so bad for a machine. A real security agent would be overwhelmed with needing to check for hundreds of potentially dangerous objects, unlike the machine. The second problem is more interesting and that is the problem of too many false positives.

In critical situations such as airport security, the condition of an inundation of false positives is better than the condition of false negatives. That is, until the number of false positives exceeds the capacity to effectively recognize critical situations within the required time (when there’s still time to avert it). This is similar to the state of current Internet search engines. Search engines return false positives knowing it’s better for us to “manually” sift through large numbers of hits rather than not even having the chance to judge the value of a hit.

It’s worth noting too that an overwhelming list of false positives aren’t just a function of many different things to look for (in this case weapons), but the many different forms they may take. Guns, for example, certainly can take many different forms. Understanding that there are many ways an object of interest can be recognized is the meaning for “robust recognition”.

What would be ideal is to receive only “true positives” (or at least minimize the number of false positives), which is what SCL is all about. For Performance Management, we can achieve true positives for cockpit displays by running SCL in the background inferring conditions from facts presented in the unwieldy array of charts, gauges and graphs and even data from outside the cockpit dsiplay instance.

SCL to the Rescue

Similar to how OLAP integrates data from multiple sources and presents totals at varying levels, SCL integrates facts from multiple sources and presents inferences based on rules. For this particular application of SCL in cockpit displays, the data sources are the charts, gauges and graphs of a cockpit display. Additionally, for CEO dashboards, SCL could also run in another process monitoring conditions and displaying any inferred conditions of interest in a single report on the CEO’s dashboard.

The high-level explanation of this methodology pertaining to cockpit displays is that the vast array of graphs and charts will raise events stating what each currently portraying that is fed into an SCL ADO.NET instance (essentially a mini client-based database).The events will be determined by code that interrogates and analyzes the data points displayed by the graphs for states such as “this represents an upward trend”, or the values are erratic”, or “this represents a huge spike”.

The SCL instance will apply these changing facts towards recognizing conditions of interest in a “robust” manner that can be gleaned from the state of each graph in the vast array.

For example, a SQL Server DBA could see a set of graphs showing CPU utilization, bytes written/read, and a list of processes running in the SQL Server instance. If there are many bytes written and read and many processes running, high CPU utilization is expected, therefore, the DBA shouldn’t be bothered. But if bytes written and read are low as well as the number of processes in the SQL Server instance, high CPU utilization means something fishy is going on and we should bother the DBA.

The results of the background SCL will display in a grid listing conditions of interest that are recognized. This grid is just another sort of “report”. This is how I envision it for both cockpit displays and dashboards for CEOs.

What I propose is that it becomes common practice to have SCL monitoring variables in applications in the background constantly inferring conditions of interest as data is updated and users navigate around the display. This is akin to the Spell Checker running in the background as I type this in Word or Intellisense check my background and proactively presenting information to me as I write C# code. SCL represents a single platform providing sufficiently robust but relatively easy to use features on which such background “intelligence” can be built across a wide range of applications.

SCL is about data and rules. This means that it is strongly integrated with SQL Server 2005 (obviously about data) and the .NET CLR (front-end or middle-tier applications where rules traditionally live). In the case of the DIDI concept, the focus is more SCL’s integration with the CLR. SCL includes a set of features that allows CLR programs (written in languages such as C# and VB.NET) to register variables, properties, or methods and events as facts in an SCL ADO.NET connection.

Each graph would register to the SCL ADO.NET connection the variable/property/method where the state of the graph can be found, as well as other values pertinent to helping SCL make good recognitions. Each graph would additionally register events that will tell SCL when values have changed and to refresh the state of the cockpit display.