What is omitted variable bias?

Omitted variable bias is a type of selection bias that occurs in regression analysis when we don’t include the right controls.

To understand this concept, let’s dive into an example. Academic achievement is associated with higher earnings. And recent studies show that a children with lots of books available to them at home tend to have better academic performance.

Does that mean that if parents simply line their shelves with books, their children will grow up to have high paying jobs? Is it possible we’ve omitted an important variable?

Let’s think about the characteristics of the parents. Is it possible that parents with a higher IQ could lead to both more books on the shelves and academic achievement for their children? So is it the books on the shelves or the parents’ higher IQ, or a combination, that contributes to higher earnings potential?

In this case, the parents’ IQ is the omitted variable. To assume that simply having more books on the shelves in a home leads to academic achievement is a case of omitted variable bias, which just means that an important factor (the parent’s IQ) has been left out the data analysis.