Packages

Here are the packages we will use in this lecture. If you don’t remember how to install them, you can have a look at the code.

install.packages( stargazer )
library( "stargazer" )

Learning objectives

1 The key concept

The difference-in-difference (diff-in-diff) is a powerful model which allows us to look at the effect of a policy intervention by taking into consideration:

  1. how a group mean changes before and after a policy intervention (treatment group) AND
  2. compare this change with the mean over time of a similar group which did not undergo the treatment (control group).

We can see a representation of the model in figure 1.1. We have two groups, one which underwent the treatment and one which did not undergo the treatment (control group). For both groups we observe the average outcome before and after the intervention.

Difference in difference model

Figure 1.1: Difference in difference model

As suggested by the name, the diff-in-diff estimator is the difference of their mean differences. More clearly, the diff-in-diff estimator takes

and substracts:

as in the following formula:

\[\begin{equation} \text{(Treatment_post - Treatment_pre) - (Control_post - Control_pre) = Diff-in-Diff estimate} \tag{1.1} \end{equation}\]

We can calculate the difference-in-difference based on graph 1.1, as below:

\[\begin{equation*} \text{(85 - 50) - (55 - 35) = 15} \end{equation*}\]
A more detailed calculation can be seen in table 1.2, the red number in the right down corner is the diff-in-diff estimator - the key parameter we will estimate in our model. When the difference-in-difference estimate is different from 0, then the treatment has had an effect.

Difference in difference, an example

Figure 1.2: Difference in difference, an example

It is important to note that in a diff-in-diff model:

  • We are going to look at grouped data, as we will always compare a treatment and a control group.

  • The coefficients will represent group means and their differences. The diff-in-diff model does not include a slope.

1.1 The statistical model

In mathematical terms, we are interested in estimating 3 coefficients, as in the following equation:

\[\begin{equation} \text{Y} = \beta_0 + \beta_1*Treatment + \beta_2*Post + \beta_3 * Treatment * Post + \text{e} \tag{1.2} \end{equation}\]

Where:

\(\text{Y}\) is our outcome variable;

\(\text{Treatment}\) is a dummy variable indicating the treatment (=1) and control (=0) group;

\(\text{Post}\) is a dummy variable indicating pre (=0) and post (=1) treatment;

\(\text{Treatment * Post}\) is a dummy variable indicating whether the outcome was observed in the treatment group AND it was observed after the intervention (=1), or any other case (=0).

Our dataset will look like this, where two subjects undergo the treatment and two subjects did not undergo the treatment. For each subject we registered post and pre treatment outcomes, such that we have two observations for each subject:

Diff-in-Diff dataset

Figure 1.3: Diff-in-Diff dataset

To recap, we have two group - a treatment and a control group - and half of the observations in each group will be before the treatment and half of them after the treatment.

1.1.1 The coefficients

We can estimate a simple diff-in-diff model based on figure 1.1 to look at the various coefficients.

reg = lm(Y ~ T + P + T*P)

stargazer( reg, 
           type = "html", 
           dep.var.labels = ("Outcome"),
           column.labels = c(""),
           covariate.labels = c("Intercept (B0)", "Treatment (B1)", "Post-treatment (B2)", "Diff in Diff (B3)"),
           omit.stat = "all", 
           digits = 2, intercept.bottom = FALSE )
Dependent variable:
Outcome
Intercept (B0) 35.00***
(0.00)
Treatment (B1) 15.00***
(0.00)
Post-treatment (B2) 20.00***
(0.00)
Diff in Diff (B3) 15.00***
(0.00)
Note: p<0.1; p<0.05; p<0.01

Each coefficient represents a group mean or a difference between two means. To facilitate the understanding of each coefficient, we can use the following graphs of our equation (1.2) and look at each coefficient individually (you can see the values of the coefficients on the left side of the graph).

\(\beta_0\) represents the average outcome of the control group before the treatment.

Difference in difference, beta 0

Figure 1.4: Difference in difference, beta 0

\(\beta_1\) represents the difference between the treatment and the control group before the treatment In this case, \(\beta_1\) is be equal to 15, such that \(\beta_0\) + \(\beta_1\) = 50, the average outcome of the treatment group before the treatment.

Difference in difference, beta 1

Figure 1.5: Difference in difference, beta 1

\(\beta_2\) represents how much the average outcome of the control group has changed in the post-treatment period. The average outcome of the control group in the post-treatment period will be equal to the initial (pre-treatment) mean - \(\beta_0\) in figure1.4 - plus \(\beta_2\).

Note that \(beta_2\) represent the trend over time, or the gains that occur over time independent from the treatment.

Difference in difference, beta 2

Figure 1.6: Difference in difference, beta 2

Finally, \(\beta_3\) is the key parameter that we are interested in estimating. It represents how much the average outcome of the treatment group has changed in the period after the treatment, compared to what it would happened to the same group had the intervention not occurred. If \(beta_3\) = 0, we can conclude that the treatment had no effect.

Note that \(\beta_3\) represents the difference; to calculate the average outcome of the treatment group after the treatment, we need to add the initial average outcome (\(beta_0\) + \(beta_1\)), the gains over time independent from the treatment (\(beta_2\)) and the gains because of the treatment (\(beta_3\)).

Diff-in-Diff, beta 3

Figure 1.7: Diff-in-Diff, beta 3

Note that each of the coefficient tests a different hypothesis - it tells us something fundamentally different about the model.

Graph \(\beta\) HYPOTHESES
Graph 1.4 \(b_0\) Is the average outcome of the control group before the treatment \(\ne\) 0?
Graph 1.5 \(b_1\) Is the difference between the control and treatment group before the treatment \(\ne\) 0?
Graph 1.6 \(b_2\) Is the difference between the average outcome of the control group before and after the treatment \(\ne\) 0?
Graph 1.7 \(b_3\) Difference in difference estimator. Does the treatment have an impact?

1.2 The counterfactual

To better understand \(beta_3\) represented in graph 1.7, we need to give a better look at the counterfactual. The counterfactual what it would have occured to Y, had the policy intervention not happened; in the diff-in-diff model, the counterfactual is the outcome of the intervention group, had the intervention not occured. \(beta_3\) represents the difference between the counterfactual and the average outcome of the treatment group in the period after the treatment.

You can see it in figure 1.8.

DID counterfactual

Figure 1.8: DID counterfactual

We can find the value of the counterfactual using the equation (1.1).

\[\begin{equation} \text{(Treatment_post - Treatment_pre) = (Control_post - Control_pre)} \tag{1.3} \end{equation}\]

\[\begin{equation*} \text{(X - 50) = (55 - 35)} \end{equation*}\]

\[\begin{equation*} \text{(X - 50) = 20} \end{equation*}\]

\[\begin{equation*} \text{ X = 20 + 50 = 70} \end{equation*}\]


We can also calculate the counterfactual using the coefficients estimated from our statistical model.

\[\begin{equation} \text{Counterfactual} = \beta_0 + \beta_1 + \beta_2 \tag{1.4} \end{equation}\]

Counterfactual, betas

Figure 1.9: Counterfactual, betas

2 A Difference-in-Difference model

A housing project in [New York](https://www1.nyc.gov/site/nycha/about/100-percent-affordable-housing.page)

Figure 2.1: A housing project in New York

The U.S. Dept. of Housing and Urban Development, along with state and local governments, subsidize several housing projects to create affordable living spaces. This policy has become increasingly important in the past yeas, as rent and housing costs increase in several US cities.

However, there are several critics to housing programs, including the belief that the development of affordable housing might cause a decline in housing value in nearby neighborhoods. There are several theories behind this idea, including:

We can test the hypothesis that housing projects reduce housing value using a difference-in-difference model (you can see a great example in this work conducted by Ellen and colleagues in 2007).

We are going to simulate some housing prices data for 2 groups of houses: houses that are located close to a subsized housing site (treatment group) and houses that are far away from a subsized housing site (control group). We are going to look at whether prices have changed before and after the creation of the subsized housing site.

Research question: Do affordable housing projects reduce prices of nearby houses?

Hypothesis: Houses nearby affordable housing projects will report a greater decrease in price over time.

2.1 Data

We can have a look at the dataset

head(data, 10)
##    House_Price Group Post_Treatment
## 1     213679.8     0              0
## 2     229355.4     0              1
## 3     190580.1     1              0
## 4     188669.0     1              1
## 5     253573.7     0              0
## 6     320094.0     0              1
## 7     188525.7     1              0
## 8     282998.4     1              1
## 9     184424.0     0              0
## 10    290409.4     0              1

The dataset contains three variables:

Variable name Description
\(\text{House Price}\) House pricing
\(\text{Group}\) Houses close to a subsized housing site (=1) and houses far from a subsized housing site (=0)
\(\text{Post_Treatment}\) Subsized housing site is not constructed (=0) or subsized housing site is open (=1)

2.2 Analysis

Our model is based on the equation (1.2):

\(\text{Y} = \beta_0 + \beta_1*Group + \beta_2*Treatment + \beta_3 * Group * Treatment + \text{e}\)
Dependent variable:
Housing prices
Intercept (B0) 216,672.60***
(3,742.00)
Treatment group (B1) 16,461.22***
(5,291.98)
Post-treatment (B2) 35,230.86***
(5,291.98)
Diff in Diff (B3) 34,066.02***
(7,484.00)
Note: p<0.1; p<0.05; p<0.01

2.3 Discussion of the results

As we discussed in our section 1.1 each coefficient tests a different hypothesis. By looking at the results we should be able to answer the following questions:

  1. Is the comparison group in Time = 1 different from zero?
  2. Did the treatment and comparison groups differ significantly in the first time period? In other words, do houses in the treatment group cost more than houses in the control group?
  3. Did the comparison group change significantly between the first and second time periods? In other words, do the prices of houses in the control group increase over time?
  4. Is the treatment group in Time = 2 different from the counterfactual? In other words, does the construction of an affordable housing site change the price of the nearby houses? If our hypothesis is confirmed we would expect a negative and significant coefficient.

Answers:

  1. The coefficient \(\beta_0\) (the constant) is significant and different from zero. Houses in the control group have an average price of 216,672.60$ in time = 1.

  2. The coefficient \(\beta_1\) (Treatment group) is also significant and different from zero, telling us that at time = 1, houses in the treatment and control group had a different average price. Specifically, houses in the treatment group cost 16,461.22$ more than houses in the control group. The average price of houses in the treatment group is 216,672.6 + 16,461.22 = 233,133.80.

  3. The coefficient \(\beta_2\) (Post-treatment) is significant and different from zero. We can conclude that the price of houses in the control group increased from time = 1 to time = 2. The average price of houses in the control group increases of 35,230.86$ at time = 2. The average price of houses in the control group at time = 2 is 216,672.6 + 35,230.86 = 251,903.5

  4. Finally, the coefficient \(\beta_3\) (Diff-in-Diff) is significant and different from zero, which means that the construction of an affordable housing site change the price of nearby houses. Particularly, the price increases of 34,066.02$ more than it would have without the construction of the housing site.

2.4 The counterfactual

Based on our results, we can calculate the counterfactual using the formula (1.4):

\[\begin{equation} \text{Counterfactual} = \beta_0 + \beta_1 + \beta_2 \tag{1.2} \end{equation}\]

The counterfactual represents the price of the houses in the treatment group, had the construction of the housing site never happend. It is equal to 268,364.7. But because the intervention did occur the price of the houses is equal to 302,430.7 (the counterfactual + the diff-in-diff).

We can map all these numbers in a graph:

Results from the working example

Figure 2.2: Results from the working example

3 Replication of a study

We are going to replicate a study conducted by Card and Krueger in 1994 that investigate the relationship between a rise in minimum wage and employment.

Economic theories have long suggested that increases in the minimum wage lead to a reduction in the employment for at least two reasons:

These theories have found mixed support but the discussion is still very much open within the policy world, as states discuss the opportunity to rise their minimum wage to help local populations to face increasing living costs. Discussions are currently occuring in New Jersey and Illinois to raise the minimum wage to 15$/hour (New york has successfully passed this same raise in 2018).

One of the first study looking at this policy problem was Card and Krueger’s. They applied a difference-in-difference the design to look at two groups of fast-food restaurants:

They collected data before and after the minimum wage was approved. Data used in the study can be download here. #PROVIDE LINK

Research question: Do increases in the minimum wage affect employment?

Hypothesis: An increase in the minimum wage is negatively correlated with employment.

3.1 Data

We start by uploading the data

data = read.csv("DATA/DID_Example.csv")

These are the variables contained in the dataset.

Variable name Description
ID Unique identifier for fast food
Treatment Pre-treatment (=0) and post-treatment (=1)
Group 1 if NJ (treatment); 0 if PA (Control)
Empl # of full time employees
C.Owned If owned by a company (=1) or not (=0)
Hours.Opening Number hours open per day
Soda Price of medium soda, including tax
Fries price of small fries, including tax
Chain 1 = BK, 2 = KFC, 3 = Roys, 4 = Wendys
SouthJ South New Jersey
CentralJ Central New Jersey
NorthJ North New Jersey
PA1 Northeast suburbs of Philadelphia
PA2 Easton and other PA areas
Shore New Jersey Shore

Here are some basic statistics.

stargazer( data,  
           type = "html", 
           omit.summary.stat = c("p25", "p75"),
           digits = 2 )
Statistic N Mean St. Dev. Min Max
ID 820 246.51 148.14 1 522
Chain 820 2.12 1.11 1 4
SouthJ 820 0.23 0.42 0 1
CentralJ 820 0.15 0.36 0 1
NorthJ 820 0.43 0.49 0 1
PA1 820 0.09 0.28 0 1
PA2 820 0.10 0.31 0 1
Shore 820 0.09 0.28 0 1
Group 820 0.81 0.39 0 1
Empl 802 8.24 8.30 0.00 60.00
C.Owned 820 0.34 0.48 0 1
Hours.Opening 820 12.65 4.76 1.00 24.00
Soda 790 1.04 0.09 0.41 1.49
Fries 775 0.93 0.11 0.67 1.37
Treatment 820 0.50 0.50 0 1

From the descriptive statistics table, you can see that the fast-food restaurants in our data have, on average, 8 employees. Moreover, 80% of the fast-food restaurants are part of the intervention group (see the Group variable).

3.2 Analysis

We now estimate the difference-in-difference model based on the equation (1.2); note that along with the Group and Treatment variables, we also include a set of control variables to account for differences across restaurants. For instance, we include the variable “opening hours” (if fast food restaurants are open more hours they might need more employees) or prices for fries and sodas (more expensive fast food might have more money to hire more employees).

reg = lm(Empl ~
Group +
Treatment +           
Treatment * Group +
C.Owned +             
Hours.Opening +      
Soda +                 
Fries +               
as.factor(Chain) +               
SouthJ +              
CentralJ +             
NorthJ +               
PA1 +                 
PA2 +                 
Shore, 
data = data)
Dependent variable:
Full-time Employees
Model results
Constant (b0) 1.62
(4.66)
Treatment (b1) -3.07**
(1.29)
Post-treatment (b2) -1.32
(1.29)
C.Owned -0.13
(0.68)
Hours.Opening 0.55***
(0.10)
Soda 2.73
(4.26)
Fries -0.65
(4.07)
Burger King -0.60
(1.09)
Roys -1.80**
(0.91)
Wendy’s 2.23**
(1.05)
SouthJ -2.46***
(0.83)
CentralJ -1.37
(0.86)
PA1 -2.41*
(1.31)
Shore 0.84
(1.05)
Diff-in-Diff estimator (b3) 4.12***
(1.41)
Note: p<0.1; p<0.05; p<0.01

Note that two variables were omitted because of multiple

3.3 Discussion of the results

As before, test your knowledge with the following questions:

  1. Is the control group in Time=1 different from zero?
  2. Did the treatment and comparison groups differ significantly in the first time period?
  3. Did the comparison group change significantly between the first and second time periods?
  4. Is the treatment Group in Time=2 different from the counterfactual? In other words, does the program have impact?

You can see the correct answers in the code.

# 1. No. The coefficient b0 is not statistically significant suggesting that the mean of the control group is not statistically different from zero (see the variable "Constant").

# 2. Yes, the treatment group has, on average, 3 employees less than the control group. Beta1 is statistically significant (see the variable "Group").

# 3. No. Despite the coefficient b2 = -1.32, suggesting a decrease in the number of employees, it is not statistically significant - therefore, we can conclude that there was no change in the control group.

# 4. Yes. The difference-in-difference estimator is equal to 4, suggesting that after the treatment, employment for restaurant in New Jersey (our treatment group) saw an increase in employment equal to an average of 4 employees.

4 Additional resources

In this section we are going to look at some more cases of difference-in-difference as well as a key assumption that you need to consider.

4.1 Cases in difference-in-difference

We can look at some specific cases of difference-in-difference models in the following pictures. This will help you thinking about the different coefficients in the model.

In the first picture, the control group does not increase over time, such that \(\beta_1\) = 0 and the control group outcome is equal to \(\beta_0\) both before and after the treatment.

In the second picture, there is no difference between the control and the treatment group before the treatment. In this case, \(\beta_0\) = \(\beta_2\).

In the last picture, the program has no effect and we would expect \(\beta_3\) = 0 as both groups have increases at the same ration.

Multiple cases in a DID model

Figure 4.1: Multiple cases in a DID model

Statistically, you can see each case represented in the following table.

T1 <- seq(20,40)
Dependent variable:
moxy
Outcome
(1) (2) (3)
Intercept (A) 20.0*** 20.0*** 20.0***
(1.4) (1.4) (1.4)
Treatment Group (B) 10.0*** -0.0 10.0***
(1.9) (1.9) (1.9)
Post-Period (C) -0.0 5.0** 10.0***
(1.9) (1.9) (1.9)
Treat x Post (D) 10.0*** 15.0*** 0.0
(2.7) (2.7) (2.7)
Observations 84 84 84
Note: p<0.1; p<0.05; p<0.01

4.2 Parallel trend assumption

The difference-in-difference model assumes that - in the absence of treatment - the treatment and control group have a similar trend over time, so as we can calculate the counterfactual based on the changes in the control group (remember that the counterfactual is calculated by summing\(beta_0\), \(beta_1\), and \(beta_2\) as in equation (1.4). This is called the parallel trend assumption.

You can see what happens when the assumption is met or not met in the following pictures, where we assume that no treatment has occured.

In the first picture, the two groups have the same growth over time.

Parallel trend assumption: met

Figure 4.2: Parallel trend assumption: met

In a difference-in-difference model we would be able to correctly calculate the counterfactual based on the trend observed in the control group and we could obtain our diff-in-diff estimator.

Parallel trend assumption: counterfactual

Figure 4.3: Parallel trend assumption: counterfactual

In this second graph, the two groups have a different trend, with the first group growing more than the second group.

Parallel trend assumption: unmet

Figure 4.4: Parallel trend assumption: unmet

If we were to calculate the counterfactual based on the control group, we would misscalculate it and attribute a greater effect to the treatment. This could happen if we compare salary at T=0 and T=10 among individuals in different professions; or the likelihood to go to college between high and low achieving students. You need to pay attention to the selection of the control group.
Parallel trend assumption: wrong counterfactual

Figure 4.5: Parallel trend assumption: wrong counterfactual

There is no easy way to test the parallel line assumption. You can (1) use theory to make and argument or (2) if you have longitudinal data compare the trend of the control and treatment group before the intervention to see if they are similar.