Post

Orthogonalization Using Regression

Introduction

“Orthogonalization” Its a method that helps to debias a dataset when there is a strong autocorrelation between some of the regressors and the predicted variable.

This is a common problem encountered when we need to estimate price elasticities or promo elasticities, which have a high auto-correlation. For example, when we try to model the demand for ice cream as a function of price, we often find that the price is also influenced by the demand (e.g. higher prices in the summer when demand is high). This can lead to biased estimates of the price elasticity of demand.

The method is based on the Frisch-Waugh-Lovell theorem and in this post its explained with some python snippets. This notes are based on the Orthogonalization Chapter from the book “Causal Inference for the brave and True”

Orthogonalization Using Regression

This procedure is a method to debias a dataset when there is a strong autocorrelation between some regressors and the predicted variable, such as price and demand. Usually, in order to estimate elasticity, we will model the demand as a regression over the price and other demand regressors.

\[Q = \beta_p * P + \beta_X*X + \beta_0\]

the problem is that usually, the price is a variable that is defined based on the demand or the expectation of the demand ( $P =f(E(Q))$ ). by example summer ice cream prices and demand in summer.

Orthogonalization is a method to somehow reduce that bias in the dataset, considering that the variable to study ( $P$ ) is not randomly assigned rather than defined throw an optimization process. This method is based on the Frisch–Waugh–Lovell theorem, and is also explained in the book The Elements of Statistical Learning : Data Mining, Inference, and Prediction.

The method

The idea behind the method is similar to the one behind the propensity score. We try to use the confounders matrix $X$ to try to predict the treatment variable $P$. After that we make a few more weird steps; we isolate the unexplained effect in the dependent variable $Q$ of the confounders $X$ and try to explain by the “randomly” debiased $\hat{P}$. These steps are detailed in the following procedure:

  1. we first fit a model to predict the treatment variable $P$ against the confounders (in our price elasticity example, $X$ = time/season/zone), then we create a “debiased random” variable $\hat{P}$ that will contain the non explained by X variance of $P$.
\[P = \theta_X*X + \theta_0\]
1
2
m_t = smf.ols("price ~ cost + C(weekday) + temp", data=test).fit()
test['price-Mt(X)'] = m_t.resid # price - pred_price(X)
  1. Secondly, we debiased the outcome variable $Q$ following a similar procedure that the first step, the idea behind this is to remove the effect of the confounders $X$ on $Q$ so we will keep only the non explained by X variance of $Q$ ( $\hat{Q}$ ).
\[Q = \gamma_X*X + \gamma_0\]
1
2
m_y = smf.ols("sales ~ cost + C(weekday) + temp", data=test).fit()
test['sales-My(X)'] = m_y.resid # sales - pred_sales(X)
  1. finally we regress our “debiased” treatment variable $\hat{P}$ against our “debiased” outcome variable $\hat{Q}$

\(\hat{Q} = \beta_p *\hat{P} + \beta_0\)

1
m_y_debiased = smf.ols("sales-My(X) ~ price-Mt(X)", data=test).fit()

We know from the theorem of Frisch that $\beta_p$ is equivalent to the original model regression

\[y_i = \beta_0 + \pmb{\beta_X X} + \pmb{\beta_p P}\]

This is incredibly powerful for causal inference. It says that I can build a model that predicts my treatment $P$ using my features $X$ , a model that predicts the outcome $Q$ using the same features, take the residuals from both models, and run a model that estimates how the residual of $\hat{P}$ affects the residual of $\hat{Q}$. This last model will tell me how $P$ affects $Q$ while controlling for $X$. In other words, the first two models are controlling for the confounding variables. They are generating data that is as good as random. This is debiasing my data. That’s what we use in the final model to estimate the elasticity.

Also is worth noting that we should keep everything in the regression world in order to use orthogonalization

This post is licensed under CC BY 4.0 by the author.