generate_qualitative_data_did.Rd
Generate a synthetic data set with qualitative outcomes under a difference-in-differences design. The data include two time periods, a binary treatment indicator (applied only in the second period), and a matrix of covariates. Probabilities time shift among the treated and control groups evolve similarly across the two time periods (parallel trends on the probability mass functions).
generate_qualitative_data_did(n, assignment, outcome_type)
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift on the treated.
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_did
computes linear predictors for each class using the covariates:
$$\eta_{mi} (d, s) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1, \quad s = t-1, t,$$
and then transforms \(\eta_{mi} (d, s)\) into valid probability distributions using the softmax function:
$$P(Y_{is}(d) = m | X_i) = \frac{\exp(\eta_{mi} (d, s))}{\sum_{m'} \exp(\eta_{m'i}(d, s))}, \quad d = 0, 1, \quad s = t-1, t.$$
It then generates potential outcomes \(Y_{it-1}(1)\), \(Y_{it}(1)\), \(Y_{it-1}(0)\), and \(Y_{it}(0)\) by sampling from {1, 2, 3} using \(P(Y(d, s) = m \mid X), \, d = 0, 1, \, s = t-1, t\).
If instead outcome_type == "ordered"
, generate_qualitative_data_did
first generates latent potential outcomes:
$$Y_i^* (d, s) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1, \quad s = t-1, t,$$
with \(\tau = 2\). It then constructs \(Y_i (d, s)\) by discretizing \(Y_i^* (d, s)\) using threshold parameters \(\zeta_1 = 2\) and \(\zeta_2 = 4\). Then,
$$P(Y_i(d, s) = m | X_i) = P(\zeta_{m-1} < Y_i^*(d, s) \leq \zeta_m | X_i) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1, \quad s = t-1, t,$$
which allows us to analytically compute the probabilities of shift on the treated.
## Generate synthetic data.
set.seed(1986)
data <- generate_qualitative_data_did(100,
assignment = "observational",
outcome_type = "ordered")
data$pshifts_treated
#> [1] -0.54695394 -0.06938273 0.61633667