What is omitted variable bias? Omitted variable bias is a type of selection bias that occurs in regression analysis when we don’t include the right controls. To

What is omitted variable bias?

Omitted variable bias is a type of selection bias that occurs in regression analysis when we don’t include the right controls.

To understand this concept, let’s dive into an example. Academic achievement is associated with higher earnings. And recent studies show that a children with lots of books available to them at home tend to have better academic performance.

Does that mean that if parents simply line their shelves with books, their children will grow up to have high paying jobs? Is it possible we’ve omitted an important variable?

Let’s think about the characteristics of the parents. Is it possible that parents with a higher IQ could lead to both more books on the shelves and academic achievement for their children? So is it the books on the shelves or the parents’ higher IQ, or a combination, that contributes to higher earnings potential?

In this case, the parents’ IQ is the omitted variable. To assume that simply having more books on the shelves in a home leads to academic achievement is a case of omitted variable bias, which just means that an important factor (the parent’s IQ) has been left out the data analysis.

Practice Questions

Play With Data


We are surrounded by data. Smartphones, doctors, schools, fitness trackers, governments, sports, researchers, and web apps are creating mountains of data. Big data. We often want to look for patterns in the data to help us analyze a variety of questions. From the serious, like what are the causes of heart disease, to the fun, like who will be the next great football player.


For example, you might be interested to know what patterns are associated with higher pay at your job. Well, no big surprise, but academic achievement goes hand-in-hand with earning lots of money. And studies have shown that if you grow up in a house with lots of books on the shelves, that you tend to do better in school. So books on the shelves cause you to do better at school which leads to more pay. Easy enough. Add books to your shelves and you'll make a lot more money.


But wait, is it the number of books on your shelves that actually causes better academic performance? Is it possible that a higher IQ of your parents would lead to both more books on your shelves and better academic achievement for you? Looking at just books and academic performance without considering your parents' IQ would be a classic case of what's called omitted-variable bias. Or could we possibly be seeing what's called reverse causation? That is academic achievement causes more books and not the other way around. Don't worry. These terms sound confusing, but they are not.


Omitted-variable bias sounds fancy, but it just means you left an important factor. In this case, your parents' IQ when studying academic achievement. Understanding these terms and more broadly understanding how to make sense of data is a crucial skill in the modern world. As data analysis is spilling into almost every industry and phrases like regression analysis, correlation coefficients, and p-scores are showing up everywhere. You're going to dive into understanding data not through a typical lecture format but through interactive play. You'll play with some fascinating real-world data sets, and through that exploration, learn the intuition behind statistical analysis and econometrics.