The Data Don’t Always Say What You Want to Hear: The Role of Cognitive Bias in Data Analysis

“If there’s something you really want to believe, that’s what you should question the most.”

Penn Jillette

It was 6th grade. I was sitting in the back of a (boring) language arts class when the boy seated next to me – a boy that I’d had a crush on for the entire semester – turned my direction and mouthed the words,

“I love you.”

At least, that’s what I thought he voicelessly said to me. It was only after he started laughing that I realized that he had instead mouthed the phrase, “Elephant shoes,” which for some reason wasn’t quite as exciting.

That same sort of miscommunication can happen when we “talk” to data. We go in with our own expectations and our own cognitive biases that sometimes make it hard for us to get out of our own way and listen to what the data have to say.

While the complete list of cognitive biases is both eye-opening and a bit frightening (I can’t actually be that irrational, can I???), the following are the main ones that I come across when working with – and talking about – data:

Confirmation Bias

We see what we expect to see and we’re more apt to go looking for support that we’re right than proof that we’re wrong.

Let’s say that you have a hypothesis that housing prices in your area have increased over the past six months. You download the data and start to do some exploratory analysis.

“Ah ha!” you think, “I was right!” as you notice that the mean sales price has indeed steadily risen. Doing your due diligence, you also check the median price, which has a decidedly downward trend. The natural instinct is to then justify a reason why the statistic that validates your expectation is correct, while the other one is obviously in error.

As a data professional (or, one could argue, as a thinking human being), it is okay (and often required) to begin with some ideas or hopes for what you may find. Yet, it is critical to keep an open and curious mind.

Instead of telling the data what you want to hear, ask the data what it has to say.

In the housing example, rather than discard the median as some sort of fluke, the next step would be to dig deeper to figure out why those trends are moving in opposite directions.

Not only do we have to be on lookout for confirmation bias in ourselves, we also have to be aware of its power when presenting data findings to others when the information may be in opposition to their previously-held beliefs. Without careful presentation, it’s easy for others to become defensive of their position and accuse the data – or you – of being wrong.

In order to make an audience receptive, it’s important to deliver the information in a manner that both protects the ego AND ignites curiosity. With the housing example, you could start by finding out what their expectations are (that prices are rising or falling) and then follow up with “You’re right AND…” while displaying a graphic showing the trends of both the mean and median prices. “How could both of these trends be true at the same time? Here’s what I found…”

Sunk Cost Fallacy

I found myself starting to fall into this one in one of my projects. I started out with the intention of using regression to predict a continuous metric to measure YouTube impact (comprised of watch time and number of views).

For a full week, I prepared the data for modeling. This required wrangling with persnickety APIs and carefully cleaning the text data before joining the related tables. This was following by carefully looking through each feature, deciding how to handle unusual values and engineering new features based on what I had. Next, I wrote and applied functions to split and prepare the data using three different methods: bag of words, TF-IDF and word-to-vector. Throughout, I kept realizing that I hadn’t cleaned the text quite well enough, and so I would go back and add a little more regex to my laundering functions.

Finally, it was time to model.

And the results were … less than stellar.

Actually, that’s a lie.

They were horrible. No matter what combination of regressor and text preparation I tried, the coefficient of determination was low and the error was high. Even the neural networks threw up their hands in defeat.

But I didn’t listen. You see, I had put SO much time and energy into the plan, that I wasn’t willing to throw it away.

The sunk cost fallacy describes our tendencies to stick with something that we have invested time, money or energy into even when there is a cost to holding on that outweighs the benefits. It’s called “sunk” cost because we’ve already invested our resources. I also like to think that “sunk” refers to the fact that we’re tied to our investments like an anchor that can often weigh us down.

There’s always a cost to holding on. And sometimes to have to let go of one thing in order to reach for something better.

One of the important skills to have when working with data is to know when it’s time to walk away from one approach. No matter how long you took getting there.

Oh, and on that project? Once I pivoted to classification, it worked out beautifully.

Action Bias

Pick up any research journal and do a quick scan of the abstracts. I expect that you would find very few papers that did not find a significant effect. Does this mean that most research hypothesis go on to be validated? Of course not. What you’re seeing is a version of the action bias.

It’s not very exciting to do a bunch of work and then exclaim, “Eureka! I found nothing!” But that’s (most) often the case. Usually all of that work results in no action apart from a closed file and the knowledge not to try that approach again.

The action bias reflects the human desire for forward progress. Think about being stuck in stop-and-go traffic for an hour or driving an hour out of your way, but on empty roads. Most of us would choose the latter option, because at least it feel like we’re doing something.

Action bestows a feeling of influence. Of power. Of purpose.

This is why it’s important to present actionable items alongside any dead ends. So our A/B testing showed that the current “buy now” button is better? Cool. So we don’t take action there, but maybe we test out a new header image. That way it’s reframed as a change in direction, rather than a halt to the action.

IKEA Effect

Which would you place more value on? A mass-produced chair that you painstakingly assembled from the box in which it arrived, or that same chair – brand new but already put together – offered to you for a price?

When we build it, we appreciate it.

In the data world, this shows up when the data professional spends so much time in the data that they forget to open the blinds and peak out into the world occasionally. They spend weeks or months working on a project. The analysis is insightful, the results significant and the models score the machine learning equivalent of an A+ on their tests. They develop a sense of ownership of the project and feel pride in their efforts. Efforts that they are sure anyone else would appreciate just as much.

“Look what I found!” they exclaim, running into the boardroom. “See this? It’s amazing! This will transform the business!”

But the reaction is lukewarm. Even if the proposed solution addresses the business problem, a finished product is never going to have the same value as one that you painstakingly built by hand. And that’s important to keep in mind – your job is to make it work and to make it easy to understand, not to get people to empathize with the process of getting there.

Back to the mass-market chair. To an outside observer, once assembled (assuming you know your way around a hex wrench), the seat you built is no different than the one purchased ready-made.

Hindsight Bias

When we look back now, we can easily see the factors that contributed to the housing collapse in 2008. They seem so obvious, these threads directly leading towards this singular outcome. So, we shake our heads at the people in the past wondering how they didn’t predict this particular outcome.

If you’re too certain, you’re likely either wrong or looking backwards.

The problem comes in when we inflate our ability to make predictions (real predictions, like in the future) because we are such rockstars at Monday-morning quarterbacking what has already happened.

Part of the reason that things like to 2008 housing crash happen is because people are TOO confident in their forecasts. It just happens that they were putting their faith in the wrong ones.

I think this may actually be a positive outcome of Covid – we’re all learning to become a bit more comfortable with uncertainty.

No matter how much we try to eliminate them, we will always have cognitive biases. Therefore, it’s important to learn how to recognize them and challenge the conclusions they erroneously lead us towards. Sometimes, we just need to get out of own way and listen to what the data are trying to tell us.

Lesson of the Day

I really hate self-promotion. But since I’m unlikely to get a job without telling/showing people what I can do, I’m having to learn to get over my discomfort and get better at it.

Frustration of the Day

Not a fan of Tableau Public. The versions keep changing and my saved work keeps disappearing.

Win of the Day

I’m trusting the process:)

Current Standing on the Imposter Syndrome Scale

2/5

I got this!

The Data Don’t Always Say What You Want to Hear: The Role of Cognitive Bias in Data Analysis

Confirmation Bias

Sunk Cost Fallacy

Action Bias

IKEA Effect