Imperfect Information

by Eugene A. Asahara
Created: March 6, 2009; Last Update: March 9, 2009

The human brain makes countless decisions within a web of dependencies with such success that most of us die from a cause other than a fatal mistake by the brain. It does this despite the fact that most decisions are based on values that are guesses to varying levels. The level of the guess can range from wild hunches to highly researched educated guesses. We live on a world of "imperfect information". Hardly anything we know can be considered absolutely true. We often make the biggest mistakes when we forget this.

The unmatched intelligence of our human brains evolved in large part to handle imperfect information. Following are short discussions of some of the phenomena that result in imperfect information:

Humans are fallible - No one is perfect. Humans are general-purpose machines and do practically nothing better than any other species. However, we do most things very well, which make us highly adaptable, which is much of the secret to our success as a species. Because we make mistakes, any communication from one human to another or information taken in through our senses is subject to misinterpretation.

Another example of our fallibility is that we tend to hear what we want to or expect to hear. This isn't a bad thing at all. This feature of ours lets us make simple decisions faster avoiding being bogged down by analysis paralysis.

Objects and Distance obscure our view -  As our ancestors hunted, trees and boulders would obstruct their view. We could perhaps see something through gaps that could be a tiger or a shadow. Distance obstructed their views as well, both due to the obvious reason of the farther something in, the harder it to see, as well as atmospheric obstructions such as fog. Distance also "obscures" sound and smell.

This can also take the form of data we do not have access to. It can be safe from our prying eyes buried under a mountain or on a secure server. However the data is inaccessible to us, we could still attempt to infer its value and take an action based on that guess. 

Time obscures our view - In a sense, the dimension of time isn't very different from any of the spatial dimensions. Data is obscured through time as artifacts of an event deteriorate. We have techniques such as carbon dating to counter the effects of time, but it's still a guess.

Our predators and prey lie to us (misinformation) - Predators and other enemies employ all sorts of tactics to makes things seem like something else, whether that is they aren't really lurking in the bushes or that worm on a hook is an easy meal for a fish. Prey also lie to us, in its ability to camouflage itself or making itself seem bigger than it really is. The eternal struggle between predator and prey is more about misinformation than about greater physical prowess.

Although we generally don't need to worry about lions and tigers and bears stalking us, the notion of predator and prey still plays a big part in human life. Businesses prey on customers with marketing gimmicks and soldiers prey on and hide from other soldiers.

Privacy - Privacy is any information for which we have no access even though an answer does exist. Privacy differs from misinformation in that in privacy, we simply don't reveal information. It is essentially a null value. Misinformation is not a null answer. It is a purposefully wrong answer intended to affect our actions through the use of false information we take to be true, although coy silence can be considered misinformation (depending on the question, no answer is "yes"). Privacy doesn't need to be something a human doesn't tell us. It could be information locked in a safe.

The human consciousness can really only focus on one thing at a time and overlook what it seeks when it's right in front of us -This is somewhat related to the notion that our predators and our prey lie to us. That's because we're not necessarily intentionally looking for the predator or prey. If we're stalking through the woods for deer, we may be on the alert for lions and tigers and bears, but our conscious mind is trained on spotting deer, our prey. Our focus on the prey will lower the chance we'll be a well-hidden predator waiting for us to come close enough; the hunter becomes the hunted.

A great example of this is the puzzle that shows a sentence such as this one. How many occurrences of the letter F are in the sentence: FINISHED FILES ARE THE RESULT OF YEARS OF SCIENTIFIC STUDY COMBINED WITH THE EXPERIENCE OF YEARS.

Whether you already know this puzzle or got it right or wrong doesn't matter. What matters is that overlooking the Fs in the word "OF" is an example of this point.

This notion is subtly similar to the previous notion. The difference is that the previous notion focuses on our conscious hunt for prey, which are lying to us by camouflaging itself. Our conscious mind scans our surroundings. But while our conscious minds are scanning the surroundings for prey, the background processes of our minds are alert for predators, which are waiting for us to come closer.

The sheer volume of data is beyond our capability to adequately process (Information Overload) - This is the "Enron tactic" and a common tactic in law. Overwhelm your opponent with a haystack from which they must recognize the needles.

The complexity of the world makes it impossible to track everything required to make conclusive predictions - Nothing ever happens exactly the same way. It may look exactly the same, but it really isn't. In order for two events to be exactly the same, the conditions need to be exactly the same.

Latency - Sometimes time is of the essence and we don't have time to research something as deeply as we would like. Therefore, at times we need to take a wild guess.

Low Probability/High Impact Events - When discussing imperfect information, it's a good idea to point out the importance of considering that even things we have always known to be true may not always be true. It may seem that imperfect information is about favoring values of higher probability. That is true, but there must also be a mechanism monitoring the low probability events. Such events have the potential for severe disruption because the "background systems" of our lives are optimized for the highly probable (ie, what happens most of the time).

Our World of Imperfect Information

The creatures of Earth live in an environment that is for the most part very stable. Thankfully, it is only "for the most part" stable and not absolutely stable otherwise, things would be extremely boring. Our Earthly environment is built upon a stable foundation of the laws of physics (which for the sake of argument remain constant), the sun which burns at a fairly constant temperature, and the Earth that spins about the sun in a well-established pattern. That pattern is in a sweet spot where it is slow and consistent with enough variation to makes things interesting but not chaotic. Built on top of that platform on the crust of the Earth are a wide array of patterns, ranging from the extremely slow but huge plate tectonics to quick and slow divisions of cells to the competition between species.

Those patterns long ago fell into an unimaginably complex rhythm resulting in a high degree of orderly but robust change. That means everything comes and goes. Landscapes come and go, species come and go, and individuals come and go. This incredible complexity and constant change means that it's challenging to find anything that could be considered absolutely true today as well as tomorrow. The only way for things to be absolutely true is for time to come to a complete stop and there will be no tomorrow or next time. But for life on Earth, it's good enough, in fact imperative, for the rhythm to continue if a value is to be pretty much true tomorrow and occasionally not so true.

However, self-aware humans attempt to impose their order on the timeless rhythms and with pretty fair success. We've managed to grow our species to a large number, conformed much of the land to our needs, and extended our lives significantly. The wave of change is constantly rolling over us as we spend a great deal of our time maintaining our buildings and homes and doing what we can to preserve our health. We organize into a sophisticated hierarchy of competing social groups ranging from the family unit to corporations to allied blocks of nations.

All of that human effort is accomplished despite imperfect information. Although humans still die and nations come and go, we do a better job of controlling our fate than other creatures. This is pretty much due to the intelligence of humans. We also further impose our order through the development and enlistment of high technology of which Information Technology is one of the primary domains today.

Imperfect Information Mechanisms

Creatures can deal with imperfect information in at least two major ways:

  1. Creatures could concern themselves at the species level and play the odds. What I mean by this is that well-formed logical rules are effective. The survival of the species is analogous to the relationship between the black-jack card game and the house. The rules are such that the house ends up with a small, but very consistent edge.
  2. If creatures could think symbolically, they could perform experiments in their heads before commiting to a physical and probably irreversible action.
The lower creatures of the world are concerned only with the first option. Individuals don't matter, but the species does. Think of a Dungeness crab you've ordered for dinner. That crab begin life as one egg among over two million carried by its mother. Without any intelligent intervention over a few years, that crab grew to maturity beating incredible odds rivaling that of a lottery winner.

There was no thinking involved. The crab literally went with the flow during its first few months as plankton. It's simple brain recognized and executed based on fairly simple rules. If all crab larvae followed those few simple rules (they really have no choice since they are "dumb" animals), a certain percentage of them will grow to sexual maturity and keep the species going.

These rules are all based on statistics. If so many sardines are encountered, some number of larvae will survive. The number of sardines is determined by the number of predator fish and mammals, etc, etc. The rules have evolved as such that the actual numbers of all parts of the system just need to fall within a range and do not need to be exact.

The rules sometimes don't work out and its the end of the game for the species. A freak string of cold seasons or an exceptionally large population of sardines disrupt the odds. Similarly for black jack, a string of bad hands will render the house or the player extinct. When our intelligence fails us by not seeing that bus as we cross a street or failing to cure our cancer, we die and say, "his number came up". These are the outlier low probability/high impact events I mentioned earlier.

Symbolically-thinking humans have the capability for the second option. When we recognize something such as a cougar, we don't automatically run away from it. We consider other aspects of the current situation such as whether the cougar is in a cage or it's someone's relatively tame pet.

The human brain is an "imperfect information machine". Much of the secret to this are two innate mechanisms, which are implemented in SCL, are "Iterative Recognition" and "Decoupled Recognition and Action". A full description of these mechanisms are beyond the scope of this essay and are described in more detail in the articles/essays:

Here, I will provide a short description. Iterative Recognition is essentially the whittling down of a long list of possibilities when we face a situation. For example, if we're walking through a meadow and see a large dark figure in the distance, what could it be? We don't know what it is because we have imperfect information. Is it a bear, a tree trunk, a black cape? We walk closer to identify it. It doesn't move. That rules out some things. We don't want to get closer, so we toss a rock at it. It doesn't move, so that rules out any animal. We walk closer until we see it's a bear holding up branches disguising itself as a weird tree to lure us close to it.

Decoupled recognition and action is a little different. Once we recognize something, what do we do? Just because we recognize a cougar doesn't mean we run. Is it in a cage? Is it dead?  For everything we encounter, we're able to pause to analyze the situation in order to select an appropriate action. We consider the severity of the consequences if the selected action is wrong. We consider availability of contingency plans or back doors if we're wrong.

These two mechanisms allow humans to side-step the fate dictated to us by simple, statistics-based rules that other creatures yield to.

Our systems of today, whether automobiles or IT applications, are engineered collections of perfect systems (or at least we like to think they are). Each component is carefully crafted and performs a function that taken in isolation is almost meaningless. However, we carefully assemble these components into useful systems.

Our information systems deal with the fact that imperfect information is the norm by adding mechanisms to address problems we encounter. We could either modify the application or develop another system to protect it from the complexities of life such as firewalls, virus scanners, and validation mechanisms. In real life, we create laws and erect fences (literally and metaphorically) to prevent people from venturing outside of a safe bubble.

How many things can happen for which we must protect our systems or other agents in the future? And how do we fix that problem without creating more? How many ways can a person die? How many ways can a competitor challenge us? If we've lived long enough, we know the silliest cliche is "Now I've seen everything." Before long, we have a mess that is so unwieldy (like 30,000 page tax codes) that it is itself a generator of imperfect information.

In the software development world, we fix so many bugs and oversights that eventually, we scrap the code and build a new version from scratch. If we're good designers, we attempt to categorize the fixes and design a component that addresses the entire category including issues we'll encounter in the future. In the past things like error handling and variable scopes were afterthoughts to a programming language. Now they are rather native features that address entire classes of problems.

If we accept that information is innately imperfect, we could either restrict what we do, becoming slaves of our systems, or we could begin to look at how to build in the handling of imperfect information. The former is what we currently do. IT is composed of applications each of which are tools created and optimized for specific tasks and are never to venture outside of its realm. Each application is composed of rules which are for the most part hard-coded and expect perfect information as input from which it will output a perfect answer.

With that said, Data Warehousing is the first step towards dealing with imperfect information by integrating data from those task-specific systems at least allowing a human user to discern something out of bounds. Figure 1 depicts an "AI Stack". At the bottom are these task-specific applications relying on perfect (or at least perceived to be perfect) information.

Figure 1 - The "AI Stack".

Data from these task-specific systems are integrated into a data warehouse providing an all-encompassing view. An interesting feature of Figure 1 is the gray dashed line just above the Data Warehouse block. Everything below the dashed line is perfect, factual data. However, above the line, values are fuzzier, first as statistics-based data mining predictions, then as what can begin to be described as "intelligence".

At the time of this writing (March 2009), Data Mining is beginning to steep into mainstream IT. The fuzzy nature of values provided by querying data mining models represents a step towards the imperfect nature of data coming front and center. Data Mining results will be accompanied by standard deviations, probabilities, and to some extent a lineage of the calculation (sort of like an OLAP drill-through).

"Intelligence" (thinking) is computationally intense (which is another reason for the software developers reading this to master those concurrent programming skills - which will be a way bigger leap than procedural programming to object-oriented). However, computer hardware, infrastructure, and software tools are at a point where we can begin to incorporate pragmatic artificial intelligence as a foundational consideration along with other foundational considerations such as distributed computing. At least, we can start by incorporating the fuzziness of statistics-based "best guess values" (along with the accompanying standard deviation) supplied by data mining systems to acknowledge the imperfect nature of information.

Related Articles:

Predictive Analytics Drilldown - When fuzzy data is surfaced on conventional software windows, there should be an easy indicator of the confidence in the value.

Systems Thinking and Imperfect Information - The foundation for Predictive Analytics consulting.