Reverse Engineering Second Nature Decisions
A Predictive Analytics Proof of Concept
by Eugene A. Asahara
December 23, 2010
After years of practicing a profession, the decisions involved in the everyday activities of a professional become second nature. "Second nature" means a decision or action is pretty automatic, quick and seemingly effortless. The professional hardly ever really thinks about why he/she make such decisions or executes an intricate action, which is crucial when there are very many decisions to be made. Examples of such professionals and high-volume decisions include:
Although these quick decisions may seem easy for the professional, they are hardly mundane. The professional makes it look easy through years of sculpting neuropathways carved by a constant flow of thousands of cases. This second nature ability enables the professional to complete ever-increasing workloads as companies strive to do "more with less". The decisions are also rather subconscious. Ask a professional to explain how they do it and they will often say, "I never really thought about it.", struggling to articulate the process.
Each case of such quick decisions are usually start with a primary characteristic that is associated with the case. For example, if we think of Babe Ruth, the first thing that may pop into a baseball fan's mind is home runs. If I think of Elvis, the first thing that pops into my mind is the iconic white-suited Vegas version of Elvis. If we need to make fast decisions, our tendencies, prejudices, etc, are rooted in these first impressions.
These first impressions are the red flags that make us react instantaneously. Very often, we really only have a split second to respond before we're hit by a car or become some animal's dinner. But if we have another second or so, secondary impressions may come into play. For example, our secondary impressions of Babe Ruth may also remind us of a high batting average or the days when a world-class athlete could sport a less than ideal physique. These secondary impressions can prevent us from mistakenly hitting a friend sneaking up on you to playfully surprise you.
What is the Value of Such Information?
As mentioned in the Overview, there are many types of expert-level information workers who must make many quick and important but similar decisions each day. What if we could reverse engineer the factors and weights the subject matter expert (SME) subconsciously applies? We could create a Predictive Analytics application to automatically make the same decisions. However, there are many caveats:
Many of these decisions are too important to leave to a relatively "stupid" Predictive Analytics model.
On the other hand, with care, there are profoundly valuable things we can do:
The development of such applications can be achieved by attempting to reverse-engineer the quick decisions made by SMEs, which means calculating the weights of various factors such that a formula derived from those weights (regression formula) can fairly consistently predict a SME's decision through a score. The key thing to keep in mind is that it's not just the prediction of predictive analytics models that are useful, it's also the model of how those predictions are made, getting into the mind of the decision maker. For example, if we interview for a job, we'll get the job or not. But if we know something about what is valuable to the intervierwes, we can better prepare. The following sections describe an experiment I conducted in 2009 as a proof of concept.
The Rating Baseball Players Experiment
To reverse-engineer the factor weights of subject matter experts, I employed a regression technique on a simple situation for which "experts" are readily available; rating baseball players on a scale of 0 through 100. I asked a number of my friends who are sports fans to rate 30 very well-known players with varying strengths.
The results of the experiment can be viewed at this link: Rate BaseBall Players Click on that link to play along as the sections below describe views obtained from that link.
The SMEs were instructed to take no more than three seconds to rate each player. Otherwise, the answer wouldn't reflect their underlying beliefs. The players had to be well-known since a SME would not be able to give an honest quick answer.
The ratings were then processed through a rather vanilla genetic algorithm to devise weights of various qualities of a baseball player that best predicts the rated players. The SMEs were not told what I intended to do with the data, much less how I would process the data or what factors were included.
I selected the factors, which as I mentioned were unknown to the SMEs at the time of the rating. That's important since knowledge of the factors would interfere with their subconscious rating. Each factor was selected to involve a wide range of qualities (all are career values):
Notice that I put a lot of thought into the factors as opposed to purely letting the data speak for itself? Meaning, I didn't just throw all sorts of stats at the ratings, but carefully selected them. Most predictive analytics endeavors in the real world will require some level of (or even much) guidance. It is the integration of human and machine intelligence, not artificial intelligence. We apply our human intelligence to guide the process. This list of factors is in fact the end result of several iterations it took to come up with reasonable results.
I soon realized stats alone didn't work. These are human raters and thus are significantly impacted by emotional factors such as steroid use, gambling, and "star quality" (Reggie Jackson being a good example of an overrated player). The main take-away is that human activity is intimately tied into psychology - which are complex patterns - and politics, which is the goal of protecting one's turf. Both are usually logically unrelated to the task at hand, but need to be accounted for to explain the end actions.
It's important to keep the number of factors as low as possible using factors that are rather multi-dimensional, such as on-base percentage or slugging percentage (not shown) as opposed to one-dimensional factors such as home-runs. Too many factors evens out the weightings and diminishes the prediction performance. The algorithm is capable of determining the factors that do indeed come into play for each SME. I chose not to engage that feature because of the "Rating Game" feature that would present too much of a burden on my ASP's servers. In a production environment, I would certainly engage that feature to improve the prediction performance.
Figure 1 depicts the raw ratings by the SME raters (columns) for the selected baseball players (rows). The color scale ranging from red to orange to yellow to light green to green helps us to differentiate the general tone of each player as well as spot outliers. If you squint your eyes, you can see the overall tinge for many of the columns. For example, Charley's row has a yellowish tinge, Homer's has a green tinge, and Ringo's row has a red tinge.
Figure 1 - Ratings by the SMEs of the baseball players.
Looking at Figure 1 by the columns, the SMEs, several things stand out:
It's important to keep in mind that the Avg column isn't important and can be misleading in this case. The regression formula is intended to determine factor weights for each rater. The weights attributed to each factor is independent of the relative values. For example, Charley seems harsher on the players than Homer. Charley's highest rating is 80 whereas Homer hardly has a rating below 80.
The predicted ratings can be displayed by selecting the "Predicted" button illustrated in Figure 1. I didn't include that view here since it's not nearly as interesting as the "Difference" view depicted in Figure 2. This is the absolute difference between the raw rating and the predicted rating.
In this case, the "yellow/orange tinge" of the Avg column does show the predictions are all around only fair at best. These averages include Ringo's obviously humorous values which significantly bring down the prediction performance. George and Keith's columns show as very red as well. As mentioned, that's because of their severe punishment of the alleged steroid users. This punishment makes such an impact because the steroid users also happen to be the players with the highest numbers of home runs. The contradiction between respect for the home run glamour stat and disrespect for steroid use greatly impedes prediction performance.
Figure 2 - Difference between the SMEs' ratings (scale of 0-100) and the prediction.
There are a few things I can do to improve the prediction performance. One is to be more robust in how cases are approached - not forcing a one-size fits all formula for each rater. The genetic algorithm has the ability to "turn off and on" factors (genes) providing various combinations. This would allow me to create multiple formulas for each player if necessary. For example, for the steroid punishers (George, Keith, Paul), we could have a rule applied to non-steroid users and another for steroid abusers.
Figure 3 illustrates the weight values calculated for the regression for each SME rater. We can see what factors mean more or less to each SME. For example, we can see that George's big factors are HR, RBI, and All Star Games. We can see that the rater named Homer ironically doesn't put much weight on home runs.
As a reminder, the raters did not set up these weights. These weights were determined by a genetic algorithm that finds the weights that best fit all the single-valued player ratings by the rater.
Figure 3 - The SMEs' regression formula components.
Figure 3 also depicts the similarity of weight factors (the "Siml" column) between a selected rater and the other raters. In this case, we can see the similarity between Charley and the other raters. Here we see that Charley's way of thinking is quite similar to George (91.7 - the three biggest factors for both of them are very close) and very dissimilar to Mick (45.2). Charley has a value of 100 since he is obviously completely similary to himself.
Figure 4 is an extension of Figure 3 in that it shows a matrix of the "Siml" column of Figure 3. It's interesting to see that the column of Ringo, the contrarian prankster, is tinged red.
Figure 4 - Rater similarity based on the similarity of the regression formula.
What I'd like to do with the similarity matrix is to find like-minded SMEs. It's not that interesting in this case of baseball raters. But what if the SMEs were doctors and we were working on how best to determine whether to order an expensive test such as an EEG? There are many valuable things:
The Rating Game
Hitting the right-most link shown on Figure 4 will bring up the Rating Game as depicted in Figure 5. The Rating Game is used to provide ratings which are based on the ratings of the first 20 players. After the 20th player, the regression will be calculated with the components shown on the right, and a predicted rating based on the regression will be shown under the "Predicted" column.
Figure 5 - The Rating Game. Rate the first 20 players and see how good the regression is for the next 10 players.
This blog presents the basic principles behind a method for capturing "the gist", the essence of a human's way of making a decision. It's certainly not as simple as this, but with more work, a system that can provide incredible value as another set of eyes can be built.
Of course, adoption of such systems must also overcome tremendous political friction, especially in the medical community.